This week, DZone released its latest Refcard:
If you're interested in learning more about Hadoop or sharpening your skills, we decided to dig into the DZone archives and find some of the most popular posts we've had on the topic:
- The map reduce framework works in two main phases to process the data: the Map phase and the Reduce phase. To explain this let's create a sample Hadoop application.
This tutorial will show you how to set up Eclipse and run your map reduce project and MapReduce job right from your IDE.
Working with simple data formats such as log files is straightforward and supported in MapReduce. In this article based on Chapter 3 of Hadoop in Practice, author Alex Holmes shows you how to work with ubiquitous data serialization formats such as XML and JSON.
After spending some time playing around on Single-Node pseudo-distributed cluster, it's time to get into real world Hadoop. Depending on what works best – Its important to note that there are multiple ways to achieve this and I am going to cover how to setup multi-node Hadoop cluster on Amazon EC2. We are going to setup 4 node Hadoop cluster as below.
Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. In this post I show how to create a MapReduce job in Java based on a Maven project like any other Java project.
And don't forget to download the Getting Started with Apache Hadoop Refcard itself!