I just received a Microsoft marketing email "Prepare for the General Data Protection Regulation" and it occurred to me that this might be the first time smaller U.S. businesses or organizations become aware of this new regulation. A distribution list like Microsoft, or Apple, could create a unified awareness in the U.S. of such a regulation more than newspapers or blog entries, television, or business journals. My experience is that many small U.S. entities are unaware of this far-reaching regulation and are unprepared for the impact to their business.
So welcome newcomers to the world of GDPR. I always open my blog entries on this topic the caveats that I am not "Certified" in "Data Protection" , nor am I a lawyer specializing in data protection or contracts. We at Vizzy Solutions and Dirty Data Girl operate as "data processors" in the GDPR classification system. Caveat aside, the Regulation Article 28 does make it a processor's responsibility to "assist the controller..." And it is at this moment exactly where my concern for smaller U.S. companies comes in. If you're asking yourself "What is a controller?" right now, that should indicate to you how far behind you are in being prepared to respond to this globally affecting regulation which becomes enforceable May 25, 2018.
This blog entry is not going to have much by me, but are typed quotes from this lecture Hadley Wickham gave at the Chicago Chapter of the ACM, March 7th, 2018. It opens with who Hadley is, but basically he's an important tool builder / developer which has driven a lot of the programmatic breakthroughs which make getting to insights faster and easier today.
One can't say he's a pioneer of the "Data Revolution", but he is one of the most prolific, current key contributors to the changes taking place in the data world. This blog is really just a list of quotes I've tried to capture from this video because he does a really good job explaining what "data science" is and how it's done. The opening 8 minutes of this hour long program does a good job of explaining what Data Science is and its process for insights.
Data comes in a large variety of "types," but let's keep this simple. You can have text like someone's name, dates, numbers, boolean. Those are the simple categories. There are more these days, but let's just keep this blog post simple. The big problem for those who are not immersed within the data world is that even within those categories there is variation - and a lot of it. Data really isn't that simple any longer.
For example, if you have a sales spreadsheet and it has someone's first name and last name in one field (cell), that really isn't considered "clean" data, even if you can read the first name and last name, such as John Smith. Tidy data is a structure which makes working with the data easy. For example, say you wanted to join your sales data with your newsletter list. You would have to have an exact pattern match of "John Smith" in