Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Beyond MapReduce and Apache Hadoop 2.X

DZone's Guide to

Beyond MapReduce and Apache Hadoop 2.X

· Big Data Zone
Free Resource

Find your next Big Data job at DZone Jobs. See jobs focused on Big Data or create your profile and have the employers come to you!

Episode 20 of the podcast was with Bikas Saha and Arun Murthy.

When I spoke with Arun a year or so a go YARN was NextGen Hadoop and there have been a lot of updates, work done and production experience since.

Besides Yahoo! other multi thousand node clusters have been and are running in production with YARN. These clusters have shown 2x capacity throughput which resulted in reduced cost for hardware (and in some cases being able to shut down co-los) while still gaining performance improvements overall to previous clusters of Hadoop 1.X.

I got to hear about some of what is in 2.4 and coming in 2.5 of Hadoop:

  • Application timeline server repository and api for application specific metrics (Tez, Spark, Whatever).
    • web service API to put and get with some aggregation.
    • plugable nosql store (hbase, accumulo) to scale it.
  • Preemption capacity scheduler.
  • Multiple resource support (CPU, RAM and Disk).
  • Labels tag nodes with labels can be labeled however so some windows and some linux and ask for resources with only those labels with ACLS.
  • Hypervisor support as a key part of the topology.
  • Hoya generalize for YARN (game changer) and now proposed as Slider to the Apache incubator.

We talked about Tez which provides complex DAGs of queries to translate what you want to-do on Hadoop without the work arounds for making it have to run in MapReduce.  MapReduce was not designed to be re-workable out side of the parts of the Job it gave you for Map, Split, Shuffle, Combine, Reduce, Etc and Tez is more expressible exposing a DAG API.

PigHiveQueryOnMR

Now becomes with Tez:

PigHiveQueryOnTez

There were also some updates on Hive v13 coming out with sub queries, low latency queries (through Tez), high precision decimal points and more!

Subscribe to the podcast and listen to all of what Bikas and Arun had to say.

Find your next Big Data job at DZone Jobs. See jobs focused on Big Data or create your profile and have the employers come to you!

Topics:

Published at DZone with permission of Joe Stein, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}