Those days are gone. Now we have Point of Sale data (PoS), Retailer Depletion Reports, Google analytics data on our website, Facebook information, Twitter info, and that doesn't even mention our membership lists. Now, our customers are not just our neighbors, or people from across the town, but also across the world. Catalogs have been replaced by anonymous users on our website, email news letter clicks have replaced our flyers. And while, in effect, the data hasn't changed, there's just more of it and we don't have a lot of control of the form in which we receive it - as evidenced by this lovely artifact which shows what the original winery data looked like from one of the data sources.
Data comes in a large variety of "types," but let's keep this simple. You can have text like someone's name, dates, numbers, boolean. Those are the simple categories. There are more these days, but let's just keep this blog post simple. The big problem for those who are not immersed within the data world is that even within those categories there is variation - and a lot of it. Data really isn't that simple any longer.
For example, if you have a sales spreadsheet and it has someone's first name and last name in one field (cell), that really isn't considered "clean" data, even if you can read the first name and last name, such as John Smith. Tidy data is a structure which makes working with the data easy. For example, say you wanted to join your sales data with your newsletter list. You would have to have an exact pattern match of "John Smith" in