Over the past two years, we’ve watched a significant chunk of the world fall in love with Hadoop. It has been remarkable to watch Big Data startups grab the spotlight and every major vendor look to warm themselves by the Hadoop fire. From the beginning, we’ve said that Hadoop was the first to offer a way to harness the power of enormous data sets but warned that it lacks the wherewithal to enable real-time decision making. We recognized that any technology that doesn’t change the here and now has limited value.
The growing chorus
Today’s GigaOM ran the story 5 reasons why the future of Hadoop is real-time, (Relatively speaking) and the title alone speaks volumes about where Big Data’s true value is found. The essence of the article is found here:
The work being done by companies like Cloudera and Hortonworks at the distribution level is great and important, as is MapReduce as a processing framework for certain types of batch workloads. But not every company can afford to be concerned about managing Hadoop on a day-to-day basis. And not every analytic job pairs well with MapReduce.
The world is waking up from a Big Data bender to discover that speed matters enormously…more every day. So much so that using SQL on Hadoop is the new conversation as that ‘old school’ query language becomes the perfect way to pair massive data with in-the-moment queries. Look no further than the rise of HBase as proof that everything isn’t headed for NoSQL.
Just a warm-up
All of this is the preamble for the need to manage Big Data as it comes across the threshold as fast-moving streams. We’re just now seeing the conversation about how to manage the fast-approaching Internet of Things and its projected 50 billion sensors (25X the current 2 billion ‘human sensors’. Managing stream data is going to put an enormous burden on the infrastructures of anyone who wants to stay relevant. If Hadoop isn’t in this game, it dies.