Over a million developers have joined DZone.
Platinum Partner

How Partitioning, Collecting, and Spilling Work in MapReduce

· Big Data Zone

The Big Data Zone is presented by Exaptive.  Learn how rapid data application development can address the data science shortage.

The figure below shows the various steps that the Hadoop MapReduce framework takes after your map function emits a key/value output record. Please note that this figure represents what’s happening with Hadoop versions 1.x and earlier - in Hadoop 2.x there have been some changes which will be discussed in a future blog post.

My book Hadoop in Practice (Manning Publications) in chapter 6 discusses how some of the configuration values in the figure should be tweaked when you start working with mid to large-size Hadoop clusters.

parition

The Big Data Zone is presented by Exaptive.  Learn about how to rapidly iterate data applications, while reusing existing code and leveraging open source technologies.

Topics:

Published at DZone with permission of Alex Holmes , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}