Over a million developers have joined DZone.

Refcard Expansion Pack: Getting Started with Apache Hadoop

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

This week, DZone released its latest Refcard:

Getting Started with Apache Hadoop

By Piotr Krewski and Adam Kawa

Apache Hadoop enables distributed storage and processing of large datasets using simple high-level programming models. This card covers the most important concepts of Hadoop, describes its architecture, and explains how to start using it as well as write and execute various applications on Hadoop.


If you're interested in learning more about Hadoop or sharpening your skills, we decided to dig into the DZone archives and find some of the most popular posts we've had on the topic:

1. Hadoop Basics - Creating a MapReduce Program

  • The map reduce framework works in two main phases to process the data: the Map phase and the Reduce phase. To explain this let's create a sample Hadoop application.

2. Running Hadoop MapReduce Application from Eclipse Kepler

  • This tutorial will show you how to set up Eclipse and run your map reduce project and MapReduce job right from your IDE.

3. Hadoop in Practice

  • Working with simple data formats such as log files is straightforward and supported in MapReduce. In this article based on Chapter 3 of Hadoop in Practice, author Alex Holmes shows you how to work with ubiquitous data serialization formats such as XML and JSON. 

4. How to Set Up a Multi-Node Hadoop Cluster on Amazon EC2, Pt. 1

  • After spending some time playing around on Single-Node pseudo-distributed cluster, it's time to get into real world Hadoop. Depending on what works best – Its important to note that there are multiple ways to achieve this and I am going to cover how to setup multi-node Hadoop cluster on Amazon EC2. We are going to setup 4 node Hadoop cluster as below.

5. Writing a Hadoop MapReduce Task in Java

  • Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. In this post I show how to create a MapReduce job in Java based on a Maven project like any other Java project.

And don't forget to download the Getting Started with Apache Hadoop Refcard itself!

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}