Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Performance Tips for Hadoop on Amazon S3

DZone's Guide to

Performance Tips for Hadoop on Amazon S3

· Cloud Zone ·
Free Resource

Insight into the right steps to take for migrating workloads to public cloud and successfully reducing cost as a result. Read the Guide.

This post from Hadoop-as-a-Service (HaaS?) platform Mortar lists six tips to help you speed up your Hadoop instance running on the Amazon Simple Storage Service (S3). If you follow the world of Hadoop, you probably know that Netflix runs Hadoop on Amazon S3 and their own Hadoop PaaS, Genie.

The top six performance tips Mortar has to offer for your Hadoop instance on S3 are as follows:

  1. Organize your S3 bucket for speed
  2. Store fewer, larger files instead of many smaller ones
  3. When to and not to compress your data
  4. Avoid underscores in bucket names
  5. Stream data directly into S3 with Elastic MapReduce (EMR)
  6. Use partition-aware S3 keys

You can get more detailed information about these six performance tips on the original article

To learn more about Netflix's Hadoop PaaS, Genie, and how Netflix built a performant, petabyte-scaled data center in the cloud, read about it here on their blog, or watch their presentation from Hadoop Summit 2013.


 

TrueSight Cloud Cost Control provides visibility and control over multi-cloud costs including AWS, Azure, Google Cloud, and others.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}