The Evolution of MapReduce and Hadoop
Join the DZone community and get the full member experience.Join For Free
Recently I authored a section of the DZone Guide for Big Data 2014. I wrote about MapReduce and the evolution of Hadoop. There is a ton of buzz in the market about speed to insight and the push toward alternative DAG engines like Spark. I see these new techs as exciting and awesome. I welcome innovation and creativity in the Hadoop market. I also temper this excitement with a bit of reality. The reality is that technology has a maturity curve. It comes out from the huge brain incubators like Google to the rest of us. Much like MapReduce, HDFS and Hbase these technologies have a long road until they grow into a vibrant open source community with dedicated developers. Then comes a stage where the project may become its own incubator project within Apache or other open source framework. Finally, you may see the tech evolve to the point that its included in a supported Hadoop distribution. Now in their efforts to compete you will see vendors clamor over one another to be the first to include hot tech brand X in their offering. I see this as competition and healthy but I will say this could lead to the inclusion of technology that we may not see removed from a distro immediately but it may just fade away. Much like RedHat many years ago with a Linux distribution, Hadoop distributions are still relatively new. Their choices are still small. Eventually, hopefully, we will see the number of projects grow to the point where vendors are actually refusing or removing projects that are not longer relevant. I’m not sure we are there yet. All that said do I think MapReduce is going to dry up and blow away? No I don’t. Do I think Spark and alternative engines will gain steam as they prove their efficiencies? Yes I do. I think most folks need to hear more than one group validate scalability. I’m sure Google has tooling that works for them. Are they sharing it or better yet has it matured to the point that we mere mortals can use it? I think that remains to be seen. MapReduce and further more Hive which uses MapReduce has a HUGE ecosystem of tools that depend on Hive.Also take a look at Stinger Next to see some exciting new developments of Hive. Take a walk down the rows of tables of vendors at the next Hadoop Summit or Strata and ask them in detail which Hadoop tool their tool depends. Most likely you will eventually get to an answer that basically is Hive (or Hcatalog). Hive and MapReduce will be here for years to come if for no other reason than market penetration. Anyway, take a look at the new DZone Guide. Its has many good topics that may interest you.
Published at DZone with permission of , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.