DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Building Scalable Real-Time Apps with AstraDB and Vaadin
Register Now

Trending

  • Designing a New Framework for Ephemeral Resources
  • Effective Java Collection Framework: Best Practices and Tips
  • How Agile Works at Tesla [Video]
  • RAML vs. OAS: Which Is the Best API Specification for Your Project?

Trending

  • Designing a New Framework for Ephemeral Resources
  • Effective Java Collection Framework: Best Practices and Tips
  • How Agile Works at Tesla [Video]
  • RAML vs. OAS: Which Is the Best API Specification for Your Project?
  1. DZone
  2. Data Engineering
  3. Big Data
  4. MapReduce and Yarn: Hadoop Processing Unit Part 1

MapReduce and Yarn: Hadoop Processing Unit Part 1

Dheeraj Gupta user avatar by
Dheeraj Gupta
CORE ·
Dec. 26, 19 · Tutorial
Like (5)
Save
Tweet
Share
99.26K Views

Join the DZone community and get the full member experience.

Join For Free

In my previous article, HDFS Architecture and Functionality, I’ve described the filesystem of Hadoop. Today, we will be learning about the processing unit of it. There are mainly two mechanisms by which processing takes place in a Hadoop cluster, namely, MapReduce and YARN. In our traditional system, the major focus is on bringing data to the storage unit. In the Hadoop process, the focus is shifted towards bringing the processing power to the data to initiate parallel processing. So, here, we will be going through MapReduce and, in part two, YARN.

Mapreduce

As the name suggests, processing mainly takes place in two steps, mapping and reducing. There is a single master (Job tracker) that controls ob execution on multiple slaves (Task tracker). The Job Tracker accepts MapReduce jobs submitted by the client. It pushes a map and reduce tasks out to Task Tracker and also monitors their status. Task trackers' major function is to run the map and reduce tasks. They also manage and store the intermediate output of the tasks.

Map Reduce Phase digram

Mapreduce in Hadoop
You may also like: Word Count Program With MapReduce and Java.

Mapper Phase

It is relatively a small program with a simple task. It is responsible for the implementation of the portion of input file data (mainly one block of one file). Interpreting, filtering, and transforming data is necessary in order to produce a stream of key-value pairs. One node is chosen to process data on the basis of the key. MapReduce transparently orchestrates all of this movement.
Image title

Mapper Phase



Java
 




x
10


 
1
public static class Map extends Mapper<LongWritable,Text,Text,IntWritable> {
2
  public void map(LongWritable key, Text value,Context context) throws IOException,InterruptedException{
3
    String line = value.toString();
4
    StringTokenizer tokenizer = new StringTokenizer(line);
5
    while (tokenizer.hasMoreTokens()) {
6
      value.set(tokenizer.nextToken());
7
      context.write(value, new IntWritable(1));
8
    }
9
  }
10
}



Reducer Phase

Reducer is a program that deals with an aggregation of all the values for the keys that they are responsible for. Each reducer typically writes output to its own file.

Reducer Phase

Reducer Phase


Java
 




xxxxxxxxxx
1


 
1
public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable> {
2
    public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException,InterruptedException {
3
      int sum=0;
4
      for(IntWritable x: values){
5
          sum+=x.get();
6
          }
7
      context.write(key, new IntWritable(sum));
8
    }
9
}



Driver Class

Java
 




xxxxxxxxxx
1
21


 
1
public class Counter{
2
    public static void main(String[] args) throws Exception {
3
        Configuration conf= new Configuration();
4
        Job job = new Job(conf,"My Word Count Program");
5
        job.setJarByClass(WordCount.class);
6
        job.setMapperClass(Map.class);
7
        job.setReducerClass(Reduce.class);
8
        job.setOutputKeyClass(Text.class);
9
        job.setOutputValueClass(IntWritable.class);
10
        job.setInputFormatClass(TextInputFormat.class);
11
        job.setOutputFormatClass(TextOutputFormat.class);
12
        Path outputPath = new Path(args[1]);
13
        //Configure the I/O path from the filesystem to the job
14
        FileInputFormat.addInputPath(job, new Path(args[0]));
15
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
16
        //delete the output path automatically from hdfs 
17
        outputPath.getFileSystem(conf).delete(outputPath);
18
        //exit the job 
19
        System.exit(job.waitForCompletion(true) ? 0 : 1);
20
    }
21
}



Further Reading

  • Stream Processing With Apache Flink.
  • Apache Spark on YARN – Performance and Bottlenecks.
  • The Complete Apache Spark Collection [Tutorials and Articles].
hadoop MapReduce Stream processing

Opinions expressed by DZone contributors are their own.

Trending

  • Designing a New Framework for Ephemeral Resources
  • Effective Java Collection Framework: Best Practices and Tips
  • How Agile Works at Tesla [Video]
  • RAML vs. OAS: Which Is the Best API Specification for Your Project?

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: