Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Configure Your MongoDB Replica Set for Analytics

DZone's Guide to

How to Configure Your MongoDB Replica Set for Analytics

Chris Chang lays out an excellent strategy for running analytics against MongoDB using analytics only replicas of your data for performance of your queries and to isolate active nodes.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

MongoDB replica sets make it easy for developers to ensure high availability for their database deployments.

A common replica set configuration is composed of three member nodes: two data-bearing nodes and one arbiter node. With two electable, data-bearing nodes, users are protected from scenarios that cause downtime for single-node deployments, such as maintenance events and hardware failures.

However, it may be tempting to read from the redundant, secondary server to scale reads and/or run queries for the purpose of analytics. We strongly advise against secondary reads when there are only two electable, data-bearing nodes in the replica set.

The main reason for this recommendation is that relying on secondary reads can compromise the high availability replica sets are meant to provide. While occasional use of the secondary for non-critical ad-hoc queries is fine, if your app requires both the primary and the secondary to shoulder the database load of your application, your system is no longer in a position to handle this load if one of the nodes in the cluster goes down or becomes unavailable.

This is discussed in more depth in the following resources:

Run Analytics Queries Against Hidden, Analytics Nodes Instead

If you would like to run more than the occasional, ad-hoc or analytics query, we highly recommend that you properly configure your replica set to handle analytics queries.  In particular, we recommend adding a node designated for analytics as a hidden, non-electable member of the replica set.

Hidden members have properties that make them great for analytics. A hidden replica set member:

Maintains a copy of the primary’s data set – Querying on a hidden member will be nearly identical to querying the primary node (minus some replication delay).

Cannot become primary and is invisible to your application – It’s important to isolate analytics traffic from production application traffic. If the analytics node became the replica set primary, it may be unable to handle the combined analytics and production application traffic.

Can be useful for disaster recovery as well if a slaveDelay is configured – See advanced configuration considerations below.

If you’re interested in adding an analytics node to your mLab deployment:

  1. Email us at support@mlab.com to request that the node be added.
  2. mLab will add the node seamlessly into your replica set as a hidden member and provide you with its address.
  3. You will then be able to start to create single-node connections using that address for your analytics queries.

Advanced Configuration Considerations

Enabling SlaveDelay on the Analytics Node for Replica Set Disaster Recovery

MongoDB’s slaveDelay option allows you to configure a replication delay on a hidden replica set member. Configuring a delay is helpful for recovering from disaster scenarios such as accidentally dropping a collection or database.

For example, imagine that you configure a one-hour delay on an analytics node. If a developer accidentally drops/deletes data from the primary node, the changes will be applied to the analytics node an hour later (as opposed to immediately). This allows you to query the analytics node to retrieve the deleted data.

Having Multiple Analytics Nodes for High Availability and/or to Scale Reads

If you would like your analytics queries to be able to withstand one node failure and/or to have more read capacity, it could make sense to have multiple, analytics nodes.

In this case, consider a Read Preference with Tag Sets to ensure that analytics queries are directed at analytics nodes only, and that non-analytics queries are directed at electable nodes only.

Reading From Secondaries in a Sharded Cluster

If you are running a Sharded deployment and would like to read from the secondary members of your shards, there are important considerations you should be aware of.  We will be publishing a blog post on this advanced topic in the future.

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
nosql ,mongodb ,analytics ,replicas

Published at DZone with permission of Chris Chang, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}