Is “big data” itself a fad, just like so many we have seen before, or is it just the term “BIG DATA”?
Definition and Current Uses
Before moving further let’s see what big data is. One definition defines big data in terms of 4 V’s:
- Volume – The vast amount of data generated
- Velocity – the speed at which new data is generated and the speed at which data moves around
- Variety – The variable sources, processing mechanisms and destinations required
- Value – Our ability to turn our data into value
Sometimes, one more V is used
- Veracity – The messiness, noises, abnormality or trustworthiness of the data
Wikipedia defines it as “a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, and information privacy."
Many view big data just in terms of “Volume” and maybe “Velocity." One can argue that the huge amount of data and its processing and usage is not a recent phenomenon, but larger web companies and entities in governments (particularly the U.S.) have been using extremely large amounts of structured and unstructured data operationally for years before arrival of the term. For them, it is just "data." So in their view, “big data” is just a passing buzzword for something which already existed. But the solutions and technologies surrounding the storage, processing and analysing of “big data” are not a fad, even if the term turns out to be one. Things like Hadoop and HDFS or NoSQL databases are extremely valuable, and in some cases, actually revolutionary technologies -- though they might seem overhyped right now. They have drawn much attention for this reason.
From the “big data” perspective, each structured and unstructured system represents a possible source of data to mine for information or for a specific answer. Data is not magic that can solve all problems – it’s just a valuable raw material that can be transformed into specific insights, eventually creating knowledge. Data is exponentially increasing every day, hour, and second of the day, for every business. And that means
Increasing opportunities to better understand and serve customers
Increasing advantages of marketing, optimization of business processes, increasing insights and knowledge about business
Improvement in health care and security
...and much more.
And now with the performance of the required technologies increasing and their prices decreasing, it is likely that a number of the organizations that struggle to deal with their medium sized data today will transition into using larger amounts of data to leverage value in coming years. One thing is clear – the big data technology has fundamentally changed:
- Massively scalable storage
- Cheap, scalable processing
- Flexible schema on read vs. schema on write
- Easier integration of search, query and analysis
- Variety of interface/interaction languages
- Open source ecosystem driving innovation
Prudent data analysis can create value, whether knowledge or monetary, for a lot of organizations. This is why technologies that allow us to store, process and analyse large amounts of data are not a fad. The value of big data is defined by how valuable the information gleaned from its process is compared to the time and resources it took to process that information. The solution and technology surrounding big data helps in getting that big value of big data.
One evidence of increasing importance, relevance and usefulness of big data technology is the increasing realization of full potential of data science including machine learning. The machine learning models—Hidden Markov Models, Support Vector Machines, Neural Nets, Bayesian Networks, KNN, Logistic Regression and more—are based on mathematical and statistical concepts that have been around for decades, but thanks to ever cheaper and more powerful computers and big data technology, they are only now beginning to realize their full potential. With enough data, these models can learn and find trends among enormous numbers of variables in complex data. Given sufficient data, these models can predict what is likely to happen -- but without telling why.
I worked on a product we launched into the market with MVP just to gauge the market pulse and the further direction, roadmap and strategy we should be embracing. But we did not know what kind of data to capture, what to analyse and how to build our strategy to start with, and we did not also wanted to lose initial data from user’s interaction with the product. So we dumped all interaction data including all transport headers etc. into a NoSQL database. Later on, after a couple of months, we processed this huge data using HDFS and Hadoop. We analysed it using machine learning algorithms to give insights on potential markets, user preferences and sentiments, business intelligence and much more that helped in defining the roadmap, strategy and feature improvement. This indicates how powerful and useful big data technology is and how much it adds value.
Big data can be powerful and disruptive to consumers and citizens. It has also now started to impact some sciences as well, for example molecular biology which after Human Genome Project has embraced Big Data in big way. Traditionally, something like cancer is tackled with a patchwork of cause-and-effect conceptual models, each one understandable by a normal human being. But now the emerging alternative is to understand something like cancer by "integrating" big data, which tackles it more effectively by measuring as many variables and features as possible in as many cancers as that can be captured and using powerful models to capture subtle relationships in the data without making any claims about what causes those relationships.
Internet of Things
The other big and emerging area where Big Data has a huge impact is Internet of Things (IoT) - another buzzword or fad, so you may say. IoT is a fast-growing constellation of internet-connected sensors attached to a wide variety of "things." Sensors can take a multitude of possible measurements, internet connections can be wired or wireless, while "things" can literally be any object (living or inanimate) to which you can attach or embed a sensor. If you carry a smartphone, for example, you become a multi-sensor IoT "thing," and many of your day-to-day activities can be tracked, analysed and acted upon. Millions or possibly billions of internet-connected “things” will produce massive amount of data, and that will need to leverage big data technology and data science to store, analyse and act upon.
The term may be the fad – not the concepts it deals with. People may be concerned about the size of data to be termed as “Big”. But in reality, it is not about peta bytes or zetta bytes only. The “volume” part of it is a moving target. What is big today may not remain big tomorrow, as companies/technologies become better and better at handling large amount of data. The term may fade but not what it refers to. When the hype surrounding Big Data dies down, it will most likely be because massive data has become the new normal, not because it has disappeared.