Over a million developers have joined DZone.

Cloud Deployments: Using Hadoop on Clouds

· Big Data Zone

Learn about how to rapidly iterate data applications, while reusing existing code and leveraging open source technologies, brought to you in partnership with Exaptive.

Packt Publishing has provided Chapter 10 of their forthcoming Hadoop MapReduce Cookbook for DZone Readers, covering Hadoop and Amazon ElasticMapReduce. The chapter explores:

  1. Running WordCount using Amazon ElasticMapReduce (EMR)
  2. Saving money using Amazon EC2 Spot Instances to execute EMR Job Flows
  3. Executing a Hive script using EMR
  4. Creating an Amazon EMR job flow using the Command Line Interface  
  5. Using Apache Whirr to deploy an Apache Hadoop cluster in EC2 cloud environment

Computing clouds provide on-demand horizontally scalable computing resources with no upfront capital investment, making them an ideal environment to perform occasional large scale Hadoop computations. In this chapter, we explore several mechanisms to deploy and execute Hadoop MapReduce and Hadoop related computations on cloud environments.  

This chapter discusses how to use Amazon Elastic MapReduce (EMR), hosted Hadoop infrastructure, to execute traditional MapReduce computations as well as  Hive computations on the Amazon EC2 cloud infrastructure. We will also use Apache Whirr, a cloud neutral library for deploying services on cloud environments, to provision a Apache Hadoop HBase cluster on cloud environments.

You can download the full chapter (in .doc format) here.

The Big Data Zone is brought to you in partnership with Exaptive.  Learn how Rapid Application Development powers business. 


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}