Over a million developers have joined DZone.

High Performance MongoDB Clusters on Amazon EC2

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Performance is an important consideration when deploying MongoDB on the EC2 platform. From a hardware perspective MongoDB performance on EC2 is gated primarily by two factors - RAM and disk speed. Typically ( there are always exceptions) CPU should not be an issue.  Memory is no longer a issue - there are plenty of size options (R3, I2, C3/C4) offering a large amount of RAM. For more details on how to choose the right instance type check my other blog post - "How to choose the right EC2 Instance type".

Historically disk speed & latency have been a constant problem on Amazon EBS. Amazon to their credit now offer a couple of options to help you with disk performance

  1. Provisioned IOPS disks - In the provisioned IOPS model you can specify at disk creation time the number of IOPS you would like your disk to support. The more IOPS you provision the more throughput your disk will be able to handle. You can go all the way upto 4000 IOPS/disk. However IOPS can get expensive at 0.065 per IOPS month. E.g. If you provision 4000 iops for the disk it will cost you $260/month just for the IOPS alone. If you have multiple servers this can add up fairly quickly
  2. Local SSD - This is the best option for disk performance on Amazon AWS. Local SSD's provide the best throughput and latency behavior of all the disk options on Amazon AWS. However they are called 'local' for a reason. If for any reason your VM is stopped, the allocated local storage is released. So the burden of data reliability is squarely on the user. Could you deploy two local ssd datastores in two different AZ's and call it solved? Not quite. If AWS has a region wide outage like it did in US-East a few years before you should expect to lose your local SSD's in all your AZ's. For these reasons local SSD instances should not be used as the primary data store for your data.

With these issues in mind we are introducing our high performance configuration on AWS.  The high performance clusters use a hybrid of local SSD and EBS provisioned IOPS disk to achieve both high performance and high reliability. A typical configuration is deployed using a three node replica set.

  • The Primary and the Secondary 1 use local SSD disks
  • Secondary 2 uses EBS provisioned IOPS

High performance monogdb replica set

What does this mean? Since the primary and the secondary 1 are running on local SSD you get the best possible disk performance from your AWS machines - no more network based EBS, just blazing fast local SSD.  Reads and writes to your primary and even the reads from the secondary 1 will work at SSD speed. Secondary 2 uses EBS provisioned IOPS for the data disk - you can configure the amount of IOPS that you want to configure for your cluster. This configuration provides complete safety for your data even in the case you use the local SSD disks. We are currently offering four sizes - Large, XLarge, X2XLarge, X4XLarge. For more details refer to our pricing page.

If you have a very high write workload it is possible that your EBS instance might not be able to keep up with your SSD instances. In this scenario there are a few options available and our support team can walk you through them. All of our existing functionality backups, restore, clone, scale, compact etc continue to work as usual.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.

java,opinion,cloud,performance,mongodb,disk,big data,ssd

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}