Over a million developers have joined DZone.

Refcard Expansion Pack: Getting Started with Apache Hadoop

DZone's Guide to

Refcard Expansion Pack: Getting Started with Apache Hadoop

· Big Data Zone ·
Free Resource

How to Simplify Apache Kafka. Get eBook.

This week, DZone released its latest Refcard:

Getting Started with Apache Hadoop

By Piotr Krewski and Adam Kawa

Apache Hadoop enables distributed storage and processing of large datasets using simple high-level programming models. This card covers the most important concepts of Hadoop, describes its architecture, and explains how to start using it as well as write and execute various applications on Hadoop.


If you're interested in learning more about Hadoop or sharpening your skills, we decided to dig into the DZone archives and find some of the most popular posts we've had on the topic:

1. Hadoop Basics - Creating a MapReduce Program

  • The map reduce framework works in two main phases to process the data: the Map phase and the Reduce phase. To explain this let's create a sample Hadoop application.

2. Running Hadoop MapReduce Application from Eclipse Kepler

  • This tutorial will show you how to set up Eclipse and run your map reduce project and MapReduce job right from your IDE.

3. Hadoop in Practice

  • Working with simple data formats such as log files is straightforward and supported in MapReduce. In this article based on Chapter 3 of Hadoop in Practice, author Alex Holmes shows you how to work with ubiquitous data serialization formats such as XML and JSON. 

4. How to Set Up a Multi-Node Hadoop Cluster on Amazon EC2, Pt. 1

  • After spending some time playing around on Single-Node pseudo-distributed cluster, it's time to get into real world Hadoop. Depending on what works best – Its important to note that there are multiple ways to achieve this and I am going to cover how to setup multi-node Hadoop cluster on Amazon EC2. We are going to setup 4 node Hadoop cluster as below.

5. Writing a Hadoop MapReduce Task in Java

  • Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. In this post I show how to create a MapReduce job in Java based on a Maven project like any other Java project.

And don't forget to download the Getting Started with Apache Hadoop Refcard itself!

12 Best Practices for Modern Data Ingestion. Download White Paper.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}