No Gettin' Around 'Em
When folks get all data-sciencey out building Business Intelligence, they start spouting words which belong in a statistics class. Words like "Bayesian", "A/B Testing", "geocoding", "probabilities", "modeling", and "clusters" -- of of which make "Business Intelligence" sound difficult to develop and acquire. And if you're just beginning to get an inkling of what data analytics can do for your business decision-making, it is easy to feel overwhelmed at what would look to be a convoluted process. However, if you step back away from all the "techy" / "math-ey" / "code-y" -ness about Business Intelligence development, you will find that it's actually more like a supply chain for manufacturing. It's just that in this case, it happens to be Information instead of physical materials. These next few blog articles will take you step-by-step through a real world process.
The Information Supply Chain: IMPORT
The development of the strategic winery reports gives us real-world examples. As part of our process at Vizzy Solutions we quantify our acquisitions, transformations, and record counts. We'll use one example from one of our deployments to explain the data pipeline and development of Business Intelligence.
The first step is - of course - acquiring the data.
At its most basic level, the "raw materials", of Business Intelligence begins with the data. The first exercise is to by evaluate what data we have, then consider what other data we need. That will include augmenting the proprietary data collection with data from outside sources, or building new datasets that we create ourselves.
In the case of the strategic reports we built for the winery, we collected five years of sales data from a winery's different POS and accounting systems. Along with that, we gathered Distributor Depletion Reports for just two of their distributors. These reports come from distributors who sell the wines to retail outlets.
Now these were just the beginning of the data sets we needed to have. When we examined the Point of Sale data, we found we had customer addresses. We thought it would be cool to find addresses for the retail locations buying wines from the distributors. The distributor data didn't have addresses for those retail locations, but government datasets do. So, we pulled in publicly available information from two states which had retailer's address and matched those records with the distributor's reports.
In addition, we wanted to evaluate the wine club health and collection information. However, that collection data didn't exist as its own table of information, so it had to be built. The same with Event Days. And finally, in firming up the wine club health we found there were problems with the POS data, so we ended up doing more work with multiple pulls of information from the POS database just to get a baseline. As a result, our data inventory looked like this:
Inventory of data imports on the project:
Essentially were 28 Excel files, but actually 34 worksheets which had to be worked with. Then there were three databases, one with multiple report pulls, and then there were the custom table builds. This is not a lot of data in the scheme of "Big Data" world, but there is a lot of variety: different file types, different formats even if there are commonalities of platforms. This is the beginning, though, of "Big Data" solutions for a Very Small Winery.
In the next blog we will go into further detail about cleaning the above data.