Over a million developers have joined DZone.

Getting Started Quickly with Hadoop and MapReduce

DZone's Guide to

Getting Started Quickly with Hadoop and MapReduce

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

So here’s the problem: You’ve finally found a block of time to set down and get your head around Hadoop and MapReduce. You do a quick Google search for a tutorial to get your started and immediately, your problems are two-fold:

  1. You are a 23 step process and a cloud deployment away from having your first Hadoop cluster spun up.
  2. The most interesting thing you will be able to do once you get your cluster up and running is to count all the words in the complete works of Shakespeare. Ho…hum.

Well, if this is your situation, you’ll be please to find that the first problem goes away immediately upon downloading Hadoop. Doug Cutting in his infinite wisdom understood that it was intimidating to spin up an entire cluster just so that you can get started learning the platform; because of this he built in a little feature that allows you to get started immediately. As an example, let’s say you have a giant 137 core cluster in the cloud and you’ve stored the complete and unabridged works of all the classic authors on HDFS in the books directory. You can run your WordCount MapReduce on the corpus and send the results to the words directory with the following command:

${HADOOP_HOME}/bin/hadoop jar WordCount.jar org.myorg.WordCount books words

On the other hand, if you have no such cluster, but you have Macbeth andRomeo and Juliet stored in the books directory on your local machine, then you can still run your WordCount MapReduce on your measly, wimpy corpus and send the results to the words directory (again, on your local machine) by issuing the exact same command.

${HADOOP_HOME}/bin/hadoop jar WordCount.jar org.myorg.WordCount books words

Pretty easy way to get started, eh?

Issue number 2 is a bit more nefarious. Why? Because word counting is easy to understand and it really is probably the most straight-forward application of MapReduce.

However I got bored of the old WordCount Hello World, and being a fairly mathy person, I decided to make my own Hello World with a mathematical twist! Take a look!

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}