Configuring a MongoDB Replica Set for Analytics
Configuring a MongoDB Replica Set for Analytics
See how you can get your MongoDB replica sets ready for analytics so you can help ensure high availability for deployments.
Join the DZone community and get the full member experience.Join For Free
New whitepaper: Database DevOps – 6 Tips for Achieving Continuous Delivery. Discover 6 tips for continuous delivery with Database DevOps in this new whitepaper from Redgate. In 9 pages, it covers version control for databases and configurations, branching and testing, automation, using NuGet packages, and advice for how to start a pioneering Database DevOps project. Also includes further research on the industry-wide state of Database DevOps, how application and database development compare, plus practical steps for bringing DevOps to your database. Read it now free.
MongoDB replica sets make it easy for developers to ensure high availability for their database deployments.
A common replica set configuration is composed of three member nodes: two data-bearing nodes and one arbiter node. With two electable, data-bearing nodes, users are protected from scenarios that cause downtime for single-node deployments, such as maintenance events and hardware failures.
However, it may be tempting to read from the redundant, secondary server to scale reads and/or run queries for the purpose of analytics. We strongly advise against secondary reads when there are only two electable, data-bearing nodes in the replica set.
The main reason for this recommendation is that relying on secondary reads can compromise the high availability replica sets are meant to provide. While occasional use of the secondary for non-critical ad-hoc queries is fine, if your app requires both the primary and the secondary to shoulder the database load of your application, your system is no longer in a position to handle this load if one of the nodes in the cluster goes down or becomes unavailable.
This is discussed in more depth in the following resources: Can I use more replica nodes to scale? and reasons to not use secondary reads to provide extra read capacity.
Run Analytics Queries Against Hidden, Analytics Nodes Instead
If you would like to run more than the occasional, ad-hoc or analytics query, we highly recommend that you properly configure your replica set to handle analytics queries. In particular, we recommend adding a node designated for analytics as a hidden, non-electable member of the replica set.
Hidden members have properties that make them great for analytics. A hidden replica set member:
Maintains a copy of the primary’s data set – Querying on a hidden member will be nearly identical to querying the primary node (minus some replication delay).
Cannot become primary and is invisible to your application – It’s important to isolate analytics traffic from production application traffic. If the analytics node became the replica set primary, it may be unable to handle the combined analytics and production application traffic.
Can be useful for disaster recovery as well if a slaveDelay is configured – See advanced configuration considerations below.
If you’re interested in adding an analytics node to your mLab deployment, email us to request that the node be added, mLab will add the node seamlessly into your replica set as a hidden member and provide you with its address, and you will then be able to start to create single-node connections using that address for your analytics queries.
Advanced Configuration Considerations
Enabling slaveDelay on the Analytics Node for Replica Set Disaster Recovery
MongoDB’s slaveDelay option allows you to configure a replication delay on a hidden replica set member. Configuring a delay is helpful for recovering from disaster scenarios such as accidentally dropping a collection or database.
For example, imagine that you configure a one-hour delay on an analytics node. If a developer accidentally drops/deletes data from the primary node, the changes will be applied to the analytics node an hour later (as opposed to immediately). This allows you to query the analytics node to retrieve the deleted data.
Having Multiple Analytics Nodes for High Availability and/or to Scale Reads
If you would like your analytics queries to be able to withstand one node failure and/or to have more read capacity, it could make sense to have multiple, analytics nodes.
Read Preference with Tag Sets to ensure that analytics queries are directed at analytics nodes only, and that non-analytics queries are directed at electable nodes only. If you want to go this route, contact , and we’ll work with you on all the details.
Reading From Secondaries in a Sharded Cluster
If you are running a Sharded deployment and would like to read from the secondary members of your shards, there are important considerations you should be aware of. We will be publishing a blog post on this advanced topic in the future.
Opinions expressed by DZone contributors are their own.