Changing Our Views on Using and Analyzing Big Data with Hadoop
Join the DZone community and get the full member experience.
Join For FreeDepending on how you look at it, the concept of Big Data has not been around that long. According to Winshuttle’s interactive timeline of Big Data history, the term first came about in 1997 to describe the problem of contemporary computer systems being unable to keep up with an expanding world of data. With the amount of information in the world increasing so rapidly, new computer systems had to be developed to store, analyze, and utilize the data being created.
In 2006, Hadoop became one predominant solution in the world of Big Data, and it remains a major player for processing Big Data today. But as needs for Big Data analysis expand and evolve, some analysts and developers consider Hadoop unable to perform to their standards.
Tony Cosentino, Ventana Research’s vice president and research director, in an interview with Information Management, expounded upon his vision for the future of Big Data. He describes companies as having the primary need for predictive analytics, but finds that currently descriptive analytics are more often and easily used. He finds the limitations in the frameworks most often used to analyze Big Data, like Hadoop. He explains:
Such statistics are likely a [result] of big data technologies such as Hadoop, and their associated distributions, having prioritized the ability to run descriptive statistics through standard SQL, which is the most common method for implementing analysis on Hadoop. It is likely a walk-before-you-run situation, and what we will see going forward is more predictive capabilities put on top of big data.
Marilyn Matz, CEO of Paradigm4, has more to say on Hadoop’s use in the future of Big Data Analysis. Her take is that:
Hadoop is well suited for simple parallel problems but it comes up short for large-scale complex analytics. A growing number of complex analytics use cases are proving to be unworkable in Hadoop. Some examples include recommendation engines based on millions of customers and products, running massive correlations across giant arrays of genetic sequencing data and applying powerful noise reduction algorithms to finding actionable information in sensor and image data.
Matz explains that, as of now, data scientists tend towards more powerful languages like R and Python. But as needs change, perhaps new frameworks will emerge that are capable of handling analytics on a larger, more complex scale. Already, Matz says, analysts are looking for new directions in the analysis of Big Data and looking for ways to leave Hadoop behind.
Opinions expressed by DZone contributors are their own.
Comments