Over a million developers have joined DZone.

Performance Tips for Hadoop on Amazon S3

DZone's Guide to

Performance Tips for Hadoop on Amazon S3

· Cloud Zone
Free Resource

Download the Essential Cloud Buyer’s Guide to learn important factors to consider before selecting a provider as well as buying criteria to help you make the best decision for your infrastructure needs, brought to you in partnership with Internap.

This post from Hadoop-as-a-Service (HaaS?) platform Mortar lists six tips to help you speed up your Hadoop instance running on the Amazon Simple Storage Service (S3). If you follow the world of Hadoop, you probably know that Netflix runs Hadoop on Amazon S3 and their own Hadoop PaaS, Genie.

The top six performance tips Mortar has to offer for your Hadoop instance on S3 are as follows:

  1. Organize your S3 bucket for speed
  2. Store fewer, larger files instead of many smaller ones
  3. When to and not to compress your data
  4. Avoid underscores in bucket names
  5. Stream data directly into S3 with Elastic MapReduce (EMR)
  6. Use partition-aware S3 keys

You can get more detailed information about these six performance tips on the original article

To learn more about Netflix's Hadoop PaaS, Genie, and how Netflix built a performant, petabyte-scaled data center in the cloud, read about it here on their blog, or watch their presentation from Hadoop Summit 2013.


The Cloud Zone is brought to you in partnership with Internap. Read Bare-Metal Cloud 101 to learn about bare-metal cloud and how it has emerged as a way to complement virtualized services.


Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}