Over a million developers have joined DZone.

Performance Tips for Hadoop on Amazon S3

DZone's Guide to

Performance Tips for Hadoop on Amazon S3

· Cloud Zone ·
Free Resource

Discover a centralized approach to monitor your virtual infrastructure, on-premise IT environment, and cloud infrastructure – all on a single platform.

This post from Hadoop-as-a-Service (HaaS?) platform Mortar lists six tips to help you speed up your Hadoop instance running on the Amazon Simple Storage Service (S3). If you follow the world of Hadoop, you probably know that Netflix runs Hadoop on Amazon S3 and their own Hadoop PaaS, Genie.

The top six performance tips Mortar has to offer for your Hadoop instance on S3 are as follows:

  1. Organize your S3 bucket for speed
  2. Store fewer, larger files instead of many smaller ones
  3. When to and not to compress your data
  4. Avoid underscores in bucket names
  5. Stream data directly into S3 with Elastic MapReduce (EMR)
  6. Use partition-aware S3 keys

You can get more detailed information about these six performance tips on the original article

To learn more about Netflix's Hadoop PaaS, Genie, and how Netflix built a performant, petabyte-scaled data center in the cloud, read about it here on their blog, or watch their presentation from Hadoop Summit 2013.


Learn how to auto-discover your containers and monitor their performance, capture Docker host and container metrics to allocate host resources, and provision containers.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}