The following are some best practices that should be considered for your MongoDB production deployments on AWS.
MongoDB recommends using either the XFS or EXT4 filesystem for greater performance. With the WiredTiger storage engine, it is strongly recommended to go with the XFS filesystem. Refer to the MongoDB production notes for finer details.
AWS EBS Configuration
It is advisable to use EBS-optimized instances to host a MongoDB database. With EBS-optimized instances, there are separate network interfaces for the database and other traffic (application traffic). In case the replica set is configured with ephemeral storage, at least one of the secondaries in the replica set should use EBS volumes as an assurance of data persistence.
Separate EBS Volumes
Separate EBS volumes should be used for storing data, logs, and the journal for improved performance. This helps in avoiding IO contention. By using separate storage devices for the data, journal, and log files, the overall throughput of the disk subsystem can be increased.
Provisioned IOPs (PIOPs)
Use provisioned IOPS to achieve consistent EBS performance. EBS volumes should be provisioned to match the write load of the primary, or else they may fall behind in replication.
Read Ahead Limit
Check the disk read ahead settings on AWS EC2. It may not be optimized for MongoDB. Set the readahead setting to 0 regardless of storage media type (spinning, SSD, etc.). Setting a higher readahead limit benefits sequential I/O operations. However, since MongoDB disk access patterns are generally random, setting a higher readahead value provides a limited benefit or performance degradation. As such, for most workloads, a readahead of 0 provides optimal MongoDB performance. For further details, read the page MongoDB 3.4 Production Notes. That said, higher read ahead value, such as 32 blocks (or 16 KB) of data, should also be tested to validate whether there is a measurable, repeatable, and reliable benefit with this value.
The value of ulimit is one of the mechanisms used by Unix OSs such as Linux to prevent a single user from using too many system resources, such as files, threads, network connections, etc. By default, the value of ulimit for nofile (no. of open files) and processess/threads (proc) is set to be low. With the lower value of ulimit, it will probably create issues in the course of normal MongoDB operations, as mongod and mongos use threads and file descriptors to track connections and manage internal operations. There are different ways in which the ulimit value can be set. Related details can be found on this page: Unix ulimit settings. Recommended values for ulimit (nofile) is 64000 (soft limit)/64000 (hard limit), and ulimit (nproc) is 64000 (soft limit), 64000 (hard limit).
TCP KeepAlive: At times, the socket-related errors between members of replica sets or sharded clusters can be attributed to a non-optimal value of TCP KeepAlive. MongoDB recommends setting the KeepAlive value to 120 seconds (2 minutes). A common KeepAlive value is 7200 sec (2 hours). For Linux, values greater than 300 seconds (5 minutes) are overridden on mongod and mongos sockets with a maximum of 300 seconds. Related details can be found on the page MongoDB Diagnostics FAQs.
Transparent Huge Pages
It is recommended to disable transparent huge pages to ensure the best performance with MongoDB. Instructions on how to disable huge pages can be found on the page Disable THP. Huge pages are one of the mechanisms of managing a large amount of memory by enabling pages (block of memory) of sizes such as 2MB or 1GB. With 4 KB pages (block of memory), it is difficult for a CPU to manage memory using the memory management unit (MMU).
Note that these pages are referenced using page table entries. Thus, 1 GB of memory would require management of 256,000 entries in the page table. Large memory sizes would need even larger page tables. However, a hardware memory management unit in a modern processor only supports hundreds or thousands of page table entries.
Additionally, hardware and memory management algorithms that work well with thousands of pages (megabytes of memory) may have difficulty performing well with millions (or even billions) of pages. This is where huge pages come into the picture. Transparent Huge Pages is an abstraction layer that automates different aspects related with creating, managing, and using huge pages. In other words, Transparent Huge Pages (THP) is a Linux memory management system that reduces the overhead of Translation Lookaside Buffer (TLB) lookups on machines with large amounts of memory by using larger memory pages. Database workloads perform poorly with THP enabled.
Access Time Settings
Most filesystems update the last access time when files are modified. When MongoDB performs frequent writes to the filesystem, this will result in unnecessary overhead and performance degradation. Thus, MongoDB recommends disabling access time settings. This feature can be disabled by editing the fstab file.
A log rotation mechanism needs to be put in place. Related details can be found on this page: Rotate Log Files.
MongoDB recommends using RAID-10 for production deployments. It can, however, turn out to be an expensive value proposition to use RAID-10 along with PIOPs on AWS. Thus, one should do appropriate due diligence before adopting RAID-10.
Indexes on Separate Storage Devices
MongoDB recommends the usage of separate storage devices for storing indexes when using WiredTiger as storage engine. Read greater details on this page: storage.wiredTiger.engineConfig.directoryForIndexes.