While Tina Turner belted out "It's physical. Only logical" , she was right. Love has nothing to do with "IT", but the integration of digitization into your business practices and how you do your work does. Productivity increases are tied to digitization rates and industries are now being classified in terms of either high or low digitization integration. Business Dictionary.com defines digitization as:
Digitized information is important to a business - if it wants to grow, not just because of efficiency gains, but because of intelligence gains. In retail, the drive to deliver personalized content to "digital natives" or Generation C:
Those days are gone. Now we have Point of Sale data (PoS), Retailer Depletion Reports, Google analytics data on our website, Facebook information, Twitter info, and that doesn't even mention our membership lists. Now, our customers are not just our neighbors, or people from across the town, but also across the world. Catalogs have been replaced by anonymous users on our website, email news letter clicks have replaced our flyers. And while, in effect, the data hasn't changed, there's just more of it and we don't have a lot of control of the form in which we receive it - as evidenced by this lovely artifact which shows what the original winery data looked like from one of the data sources.
No Gettin' Around 'Em
When folks get all data-sciencey out building Business Intelligence, they start spouting words which belong in a statistics class. Words like "Bayesian", "A/B Testing", "geocoding", "probabilities", "modeling", and "clusters" -- of of which make "Business Intelligence" sound difficult to develop and acquire. And if you're just beginning to get an inkling of what data analytics can do for your business decision-making, it is easy to feel overwhelmed at what would look to be a convoluted process. However, if you step back away from all the "techy" / "math-ey" / "code-y" -ness about Business Intelligence development, you will find that it's actually more like a supply chain for manufacturing. It's just that in this case, it happens to be Information instead of physical materials. These next few blog articles will take you step-by-step through a real world process.
It is almost as good as the old joke, "Well, I could tell you, but then you'd wish yourself dead." Worse, most definitions, including this unsourced and probably plagiarized definition you can find popping up everywhere consists mostly of even more buzz words, heck, we could make a Bingo card out of it. Still, for those of us who work with data and technology, it makes perfect sense -- but that's because we build it every single day.
However, for people whose work does not touch technology for more than a Facebook post or a Google search, this definition sounds very expensive, indefinite, and unclear. It is just another set of buzz words without any concrete experience to give abstractions such as "information," or "informed," or "actionable." Because, let's face it, until recently most businesses have operated off of a bookkeeper's basic reports of sales and expenses if not just a bank statement. Most folks not involved in the "data" intensive industries might know how to make a pivot table in a spreadsheet program, but evaluating the accuracy of a box-plot, or even knowing what a box-plot consists of and why you'd want to use it -- well, that's another story.
It all begins with data
The intention of this blog is to communicate with people who don't speak "Data". Unfortunately, I'm a data geek, and so I struggle to speak "human." I get a "deer in the headlights look" when I use phrases like "Big Data", or "Open Data" and my friends say, "huh?" How to communicate everything that's changed in the world of the internet, and the digitization of information is challenging because language develops around experience, and technical language is precise. What's really interesting is being in the midst - literally - of the development of a new global language because of technical change. However, that makes translating back to non-technical language even more challenging.
Volume, Velocity, Variety
I just received a Microsoft marketing email "Prepare for the General Data Protection Regulation" and it occurred to me that this might be the first time smaller U.S. businesses or organizations become aware of this new regulation. A distribution list like Microsoft, or Apple, could create a unified awareness in the U.S. of such a regulation more than newspapers or blog entries, television, or business journals. My experience is that many small U.S. entities are unaware of this far-reaching regulation and are unprepared for the impact to their business.
So welcome newcomers to the world of GDPR. I always open my blog entries on this topic the caveats that I am not "Certified" in "Data Protection" , nor am I a lawyer specializing in data protection or contracts. We at Vizzy Solutions and Dirty Data Girl operate as "data processors" in the GDPR classification system. Caveat aside, the Regulation Article 28 does make it a processor's responsibility to "assist the controller..." And it is at this moment exactly where my concern for smaller U.S. companies comes in. If you're asking yourself "What is a controller?" right now, that should indicate to you how far behind you are in being prepared to respond to this globally affecting regulation which becomes enforceable May 25, 2018.
This blog entry is not going to have much by me, but are typed quotes from this lecture Hadley Wickham gave at the Chicago Chapter of the ACM, March 7th, 2018. It opens with who Hadley is, but basically he's an important tool builder / developer which has driven a lot of the programmatic breakthroughs which make getting to insights faster and easier today.
One can't say he's a pioneer of the "Data Revolution", but he is one of the most prolific, current key contributors to the changes taking place in the data world. This blog is really just a list of quotes I've tried to capture from this video because he does a really good job explaining what "data science" is and how it's done. The opening 8 minutes of this hour long program does a good job of explaining what Data Science is and its process for insights.
Data comes in a large variety of "types," but let's keep this simple. You can have text like someone's name, dates, numbers, boolean. Those are the simple categories. There are more these days, but let's just keep this blog post simple. The big problem for those who are not immersed within the data world is that even within those categories there is variation - and a lot of it. Data really isn't that simple any longer.
For example, if you have a sales spreadsheet and it has someone's first name and last name in one field (cell), that really isn't considered "clean" data, even if you can read the first name and last name, such as John Smith. Tidy data is a structure which makes working with the data easy. For example, say you wanted to join your sales data with your newsletter list. You would have to have an exact pattern match of "John Smith" in