The intention of this blog is to communicate with people who don't speak "Data". Unfortunately, I'm a data geek, and so I struggle to speak "human." I get a "deer in the headlights look" when I use phrases like "Big Data", or "Open Data" and my friends say, "huh?" How to communicate everything that's changed in the world of the internet, and the digitization of information is challenging because language develops around experience, and technical language is precise. What's really interesting is being in the midst - literally - of the development of a new global language because of technical change. However, that makes translating back to non-technical language even more challenging.
Volume, Velocity, Variety
When I first began collecting data I had a 1.2 kbps dial-up connection off of a 13 lb Compaq laptop. That first connection was literally 1,200 bits per second (1,200 1s & zeroes). Today, we talk in terms of (fiber) speeds of 125,000,000 bits per second and some providers supply 1,000,000,000 bits per second. Heck, I was actually part of the fiber beta test with the now-defunct @Home network - around 1996 or so. If you remember, you had to have a service provider to dial into to connect to the internet. You dialed a phone number, and paid by the minute. @Home's connection was "always on" and wicked fast. Netscape's browser was released in 1994 and improved the experience of browsing on the internet from looking at file folders and having to open a file to see a picture, to actually having something to view.
So, not having to listen to that hum of a dial-up and wait for it, wait for it, wait for it connect was awesome! However, only @Home's site was optimized to support that kind of speed. Everyone else in the world had their "websites" (many of which were still file folders) hooked up to a 9.6 or 14.4 kbps (9,600 or 14,400 bits per second) modems and so the speed on my end was useless. I cancelled my subscription and opted back into the slower speeds using the dial-up method for a couple more years.
Interestingly enough, because I worked in wireless telecommunications, I was working with technologies and technological deployments of not just the voice networks, but also the early data networks. There were times when it was faster and easier to go over the air than it was to go over the land-line. That still holds true today. Still, as with anything to do with the speed of connection, you can be as fast as YOU want, but if the other end isn't capable of responding with that speed, your resource is wasted.
And all during this time, I was pulling down more and more data for my job as a telecommunications engineer. What had started off of as pulling down alarm data through a 1.2 kbps TTY connection had evolved to pulling configuration and performance data from the 51 switches of the AT&T Wireless network. This was so I could quantify, measure, and report the capacity and performance of the voice switches. And to be able to do that, what used to take three commands and log file dumps to get a view of connectivity & availability to the switches grew to over 26 commands and log files for just a single portion of a capacity measure.
Then came the additional types of network nodes to integrate. The variety of data exploded. Luckily, because it was a big company, we had a database organization which took over pulling the data, parsing (breaking it up into discrete records) the data, and stuck it in a database so engineers like myself could develop analyses and report on the health of our voice and data networks.
Then the internet began to grow up. Business built websites, government agencies did too. The Internet of Things was no longer just a cell phone, but a car's computer chip, or an assembly line robot. Robotics and machines grew in complexity as chips could hold more and more information in their tiny real-estate. Imaging systems were no longer x-rays or ink on paper or photographic gels, but 1s and 0s.
For people like me, the mid-2010s were when the "outside world" finally became interesting. There were enough datasets of sufficient complexity to challenge the datasets which were being generated through the complex wireless networks I'd spent my career engineering. And that data was open and available to be used, not hidden away in a book filed away in a remote library or office to which no one except those sufficiently local could access. And that is what the data revolution is also all about - the access to information which used to be inaccessible. And that newly accessible information can now be joined with other information which was otherwise previously inaccessible to create new understandings.
So, what is "Big Data"? Simply put, Big Data means there's a lot of it, that comes at you fast, and varies in type from pictures, to sound, to text, to machine code, to whatever. Your business might not have Petabytes ( 800,000,000,000,000 1s or 0s where a byte is 8 bits) of information, but it probably has a large variety of data though. Having the financial data, the sales data, the customer data, the this data, the that data - it's that variety which bogs down the decision-making process of many small-to-medium sized businesses today.