The Director of Education at Cloudera Offers an Insider View of Hadoop
DZone: How do you think that Hadoop (and Big Data processing / analytics) will impact the overall developer space over the next few years?
Sarah Sproehnle: We're seeing a tremendous investment in developers moving from traditional back end database development to the Hadoop space. Processes that used to be coded in PL/SQL or that relied on large in-memory state are now being written using Hadoop for data processing and HBase for real-time applications. A lot of applications that were built on top of databases, where developers struggled to fit non-relational paradigms into relational stores, are now being built more quickly and with access to data at any scale.
DZone: You covered some common implementations of Hadoop in your presentation - which of these do you think is the most innovative or interesting?
Sarah Sproehnle: A lot of people use Hadoop to do complex data processing such as billing mediation and transaction reconciliation. Similarly, Hadoop is a popular tool for recommendation engines and predictive modeling. At the forefront through is people building real-time interactive applications on top of HBase. These are driving both data serving (such as user profiles or POIs) and as the basis for incremental analytics where business can monitor how their systems are behaving in real-time.
DZone: What are some cool tools (or uses) of Hadoop in development or coming up that Java developers should be aware of?
We're seeing a lot of interest in higher level libraries that make Hadoop much more accessible to Java developers. For example Crunch (here and here and is a FlumeJava inspired library. We've been hearing some very positive feedback from Java developers who want a lot of the mechanics of MapReduce taken care of but don't want to write in Hive or Pig.