Over a million developers have joined DZone.

How Partitioning, Collecting, and Spilling Work in MapReduce

· Big Data Zone

The figure below shows the various steps that the Hadoop MapReduce framework takes after your map function emits a key/value output record. Please note that this figure represents what’s happening with Hadoop versions 1.x and earlier - in Hadoop 2.x there have been some changes which will be discussed in a future blog post.

My book Hadoop in Practice (Manning Publications) in chapter 6 discusses how some of the configuration values in the figure should be tweaked when you start working with mid to large-size Hadoop clusters.



Published at DZone with permission of Alex Holmes , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}