- Running WordCount using Amazon ElasticMapReduce (EMR)
- Saving money using Amazon EC2 Spot Instances to execute EMR Job Flows
- Executing a Hive script using EMR
- Creating an Amazon EMR job flow using the Command Line Interface
- Using Apache Whirr to deploy an Apache Hadoop cluster in EC2 cloud environment
Computing clouds provide on-demand horizontally scalable computing resources with no upfront capital investment, making them an ideal environment to perform occasional large scale Hadoop computations. In this chapter, we explore several mechanisms to deploy and execute Hadoop MapReduce and Hadoop related computations on cloud environments.
This chapter discusses how to use Amazon Elastic MapReduce (EMR), hosted Hadoop infrastructure, to execute traditional MapReduce computations as well as Hive computations on the Amazon EC2 cloud infrastructure. We will also use Apache Whirr, a cloud neutral library for deploying services on cloud environments, to provision a Apache Hadoop HBase cluster on cloud environments.
You can download the full chapter (in .doc format) here.