data is ever-changing in dynamic systems. Even when working with statistics spilled out of machines, a value of 2147483648 showing up - out of the blue - can really mess with that Red-Yellow-Green report and causing executives to run from their offices like the building was burning down.
It's one thing when you're just setting up a new data pipeline. You have to see if you're receiving all the records, or are some being missed; and you see the data, are you seeing nulls for no data points, or zeroes; are the numbers within the range you expected; do the text values make sense. The list for basic data stream validation is long. But once the datastream is flowing and you work every week with your values, you depend upon them to remain within range, to remain integers or character datatypes, and not go all cattywompus because commas now show up in a text field where
before there were none. Yeah, finding that the data stream is polluted can throw your "world view" completely off.
And dirty data is more than capturing that 2 the 32nd power before it shows up in a report and some decision-maker drops dead at their desk from heart-attack; not recognizing there's a problem with your data in the first place inhibits a decision-maker's ability to get to the right answer if there is one, or to approximately the right answer if there isn't an absolute.
At its most basic level, having "clean" data means that the elemental unit of quantifying your business has been defined. Data is the building block with which you erect your business' story. Build that story on erroneous data, and you're telling a fiction. So if bad data begins to show up in your data stream, if numbers or names begin to change, if rules are not strictly kept then the next thing you know you believe George Jackson from Rome, Iowa really wants that corn cob pipe because you just mixed up his record with Jackson George from Rome, Indiana.
One of my favorite examples of how data is ignored, or bad data is allowed to slip into the analysis has to do with a recommender system from my cable provider. Customers have changed in the past 20 years with the growth of the internet coupled with the ubiquity of mobile devices, and the availability of personalized data. I admit. I'm one of them. I'm one of them because I stopped using their mobile app to view my shows because they kept showing the same Dodge Ram / Ford Truck ads to me while I'm trying to watch Project Runway.
Now maybe there is a high degree of correlation between TV viewers who watch reruns of every season of Project Runway and those who show up on Big Truck lots right after. All I know is that I. AM. NOT. one of them. If I were an advertiser with *********(Redacted)******* I would be "very" upset at how my ad dollars were being spent.
More importantly, as an advertiser, I'd expect to be able to measure the success of my ad and to see ****(Redacted)**** approval for the ads.
The point is that consumers EXPECT to receive personalized recommendations based upon what you know about them and their purchasing behaviors. It doesn't matter what industry you're in any longer, the customer expects you to *know* them from their patterns of behavior.
And this is where the "brick and mortar" can gain advantage over "on-line" because they can blend both the ephemeral electronic data with the visceral, tactile experience of a physical presence. It's just that on-line companies have much more experience and exposure to IT basics like standardized reporting, so they integrate that into their planning. Still, small companies can begin to create deeper relationships to their customers by understanding what their preferences are. So, clean up that information!