In our previous post, I discussed MongoDB’s pluggable storage engine architecture and characteristics of each storage engine. In this post, I will talk about how to select which storage engine to use, as well as mixing and matching storage engines in a replica set.
How To Select Which Storage Engine To Use
WiredTiger will be the storage engine of choice for most workloads. WiredTiger’s concurrency and excellent read and write throughput are well-suited for applications requiring high performance:
- IoT applications: sensor data ingestion and analysis.
- Customer data management and social apps: updating all user interactions and engagement from multiple activity streams.
- Product catalogs, content management, and real-time analytics.
For most workloads, it is recommended to use WiredTiger. The rest of this article will discuss situations where other storage engines may be applicable.
The Encrypted storage engine is ideally suited to be used in regulated industries such as finance, retail, healthcare, education, and government. Enterprises that need to build compliant applications with PCI DSS, HIPAA, NIST, FISMA, STIG, or other regulatory initiatives can use the Encrypted storage engine with native MongoDB security features such as authorization, access controls, authentication, and auditing to achieve compliance.
Before MongoDB 3.2, the primary methods to provide encryption-at-rest were to use third-party applications that encrypt files at the application, file system, or disk level. These methods work well with MongoDB but tend to add extra cost, complexity, and overhead.
The Encrypted Storage engine adds ~15% overhead compared to WiredTiger, as available CPU cycles are allocated to the encryption/decryption process — though the actual impact will be dependent on your data set and workload. This is still significantly less compared to third-party disk and file system encryption, where customers have noticed 25% overhead or more.
The Encrypted Storage engine, combined with MongoDB native security features such as authentication, authorization, and auditing provides an end to end security solution to safeguard data with minimal performance impact.
The advantages of in-memory computing are well-understood. Data can be accessed in RAM nearly 100,000 times faster than retrieving it from disk, delivering orders-of-magnitude higher performance for the most demanding applications. With RAM prices continuing to tumble and new technologies such as 3D non-volatile memory on the horizon, the performance gains can now be realized with better and improving economics than ever before.
Not only is fast access important, but predictable access, or latency, is essential for certain modern day applications. For example, financial trading applications need to respond quickly to fluctuating market conditions as data flows through trading systems. Unpredictable latency outliers can mean the difference between making or losing millions of dollars.
While WiredTiger will be more than capable for most use cases, applications requiring predictable latency will benefit the most from the In-Memory storage engine.
Enterprises can harness the power of MongoDB core capabilities (expressive query language, primary and secondary indexes, scalability, high availability) with the benefits of predictable latency from the In-Memory storage engine.
Examples of when to use the In-Memory engine are:
- Algorithmic trading applications that are highly sensitive to predictable latency; such as when latency spikes from high traffic volumes can overwhelm a trading system and cause transactions to be lost or require re-transmission.
- Real-time monitoring systems that detect anomalies such as fraud.
- Applications that require predictable latency for processing of trade orders, credit card authorizations, and other high-volume transactions.
- Sensor data management and analytics applications interested in spatially and temporally correlated events that need to be contextualized with real-time sources (weather, social networking, traffic, etc.).
- Security threat detection.
- Session data of customer profiles during a purchase.
- Product search cache.
- Personalized recommendations in real time.
- First person shooter games.
- Caching of player data.
- Real-time processing and caching of customer information and data plans.
- Tracking network usage for millions of users and performing real-time actions such as billing.
- Managing user sessions in real time to create personalized experiences on any mobile device.
Though WiredTiger is better suited for most application workloads, there are certain situations where users may want to remain on MMAPv1:
- Legacy Workloads: Enterprises that are upgrading to the latest MongoDB releases (3.0 and 3.2) and don’t want to re-qualify their applications with a new storage engine may prefer to remain with MMAPv1.
- Version Downgrade: The upgrade process from MMAP/MMAPv1 to WiredTiger is a simple binary compatible “drop in” upgrade, but once upgraded to MongoDB 3.0 or 3.2 users cannot downgrade to a version lower than 2.6.8. This should be kept in mind for users that want to stay on version 2.6. There have been many added features included in MongoDB since version 2.6, thus it is highly recommended to upgrade to version 3.2.
Mixed Storage Engine Use Cases
MongoDB’s flexible storage architecture provides a powerful option to optimize your database. Storage engines can be mixed and matched within a single MongoDB cluster to meet diverse application needs for data. Users can evaluate different storage engines without impacting deployments and can also easily migrate and upgrade to a new storage engine following the rolling upgrade process. To simplify this process even further, users can utilize Ops or Cloud manager to upgrade their cluster’s version of MongoDB through a click of a button.
Though there are many possible mixed storage configurations, here are a few examples of mixed storage engine configurations with the In-Memory and WiredTiger engines.
Figure 10: eCommerce application with mixed storage engines.
Since the In-Memory storage engine does not persist data as a standalone node, it can be used with another storage engine to persist data in a mixed storage engine configuration. The eCommerce application in Figure 10 uses two sharded clusters with three nodes (one primary, two secondaries) in each cluster. The replica set with the In-Memory engine as the primary node provides low latency access and high throughput to transient user data such as session information, shopping cart items, and recommendations. The application’s product catalog is stored in the sharded cluster with WiredTiger as the primary node. Product searches can utilize the WiredTiger in-memory cache for low latency access. If the product catalog’s data storage requirements exceed server memory capacity, data can be stored and retrieved from disk. This tiered approach enables “hot” data to be accessed and modified quickly in real time, while persisting “cold” data to disk.
The configuration in Figure 11 demonstrates how to preserve low latency capabilities in a cluster after failover. Setting priority=1 in the secondary In-Memory node will result in automatic failover to that secondary, and eliminate the need to fully repopulate the failed primary when it comes back online. Additionally, if the transient data needs to be persisted then a secondary WiredTiger node can be configured to act as a replica, providing high availability and disk durability.Figure 11: Mixed storage engines, with hidden WiredTiger secondary
To provide even higher availability and durability a five node replica set with two In-Memory and three WiredTiger nodes can be used. In Figure 12, the In-Memory engine is the primary node, with four secondary nodes. If a failure to the primary occurs, the secondary In-Memory node will automatically failover as the primary and there will be no need to repopulate the cache. If the new primary In-Memory node also fails, then the replica set will elect a WiredTiger node as primary. This mitigates any disruption in operation as clients will still be able to write uninterrupted to the new WiredTiger primary.Figure 12: Mixed storage engines with five node replica set
Additionally, a mixed storage engine approach is ideally suited for a microservices architecture. In a microservice architecture, a shared database between services can affect multiple services and slow down development. By decoupling the database and selecting the right storage engines for specific workloads, enterprises can improve performance and quickly develop features for individual services.
MongoDB is the next-generation database used by the world’s most sophisticated organizations, from cutting-edge startups to the largest companies, to create applications never possible at the fraction of the cost of legacy databases. With pluggable storage engine APIs, MongoDB continues to innovate and provide users the ability to choose the most optimal storage engines for their workloads. Now, enterprises have an even richer ecosystem of storage options to solve new classes of use cases with a single database framework.