Backup and Recovery of MongoDB With RecoverX (Part 2)
Datos IO's Shalabh Goyal covers some concerns MongoDB users have and how Datos IO's RecoverX can lend a hand.
Join the DZone community and get the full member experience.Join For Free
This blog is the second in a two-part series on backup and recovery for MongoDB. In one of my previous blogs, I covered why companies require both replication and backup for enterprise-grade data protection. And in the first part of this blog series, I discussed the existing solutions for backup and recovery of MongoDB and their drawbacks. Now in this blog, I will discuss the key requirements for protecting data that resides on MongoDB (deployed either on-premise, on a private cloud with an as-a-service model, or in a public cloud with Amazon AWS, Google Cloud Platform, etc.) and the innovative features of Datos IO RecoverX, cloud-scale data protection software.
Requirement #1: Online Cluster-Consistent Backups
One of the key requirements of next-generation applications that are deployed on MongoDB is the always-on nature. This means that quiescing the database for taking backups is not feasible and moreover, the backup operation should not impact the performance of the application. As the application scales, the underlying MongoDB also needs to scale-out to multiple shards. In this case, a backup solution must provide a consistent backup copy across shards without disrupting database and application performance during backup operations.
Scalable Versioning: Datos IO RecoverX versioning operation creates a true point-in-time backup copy of MongoDB databases that is consistent across shards. During the initial full copy, RecoverX transfers data from the primary nodes of each shard. For all subsequent backups, the oplogs from all nodes (primary and secondary) are tracked and transferred to secondary storage. RecoverX uses the oplogs to create a consistent version of MongoDB database across all shards without quiescing the database. This also allows customers to have a highly durable backup in the event that the primary node fails.
Requirement #2: Flexible Backup Options
Depending on the application, data may have different change rate and patterns. For example, in a product catalog, certain items may be refreshed every day (fast-selling goods), while the others may have a longer shelf life (premium items). Based on the application requirements, some collections may need to be backed up every hour versus the others that may be backed up daily. Providing this flexibility to schedule backups at any interval and at collection level granularity is another requirement that we have heard from customers who are using MongoDB. More importantly, these backups should always be stored on the secondary storage in native formats to avoid vendor lock-in.
Native Storage Format and Semantic Deduplication: RecoverX provides collection-level backup granularity so admins may define the backup interval based on the RPO requirements. Further, the backup data is stored in the native BSON format. Therefore, vendor lock-in is avoided. More importantly, native formats allow advanced data management services such as queryable versions, search of backup data sets, ability to run applications directly on secondary data, et al. Finally, RecoverX brings semantic de-duplication, an industry-first feature, that helps reduce the cost of storing backups. As a part of versioning, Datos IO RecoverX makes sure that the backup copy has no replicas of primary data set, thus providing de-duplication of source data — all without losing native formats. Moreover, after the first full version, all subsequent versions are incremental in nature where only the delta changes are transferred. This results in more than ~70% savings on secondary storage costs.
Requirement #3: Scalable Recovery
During its lifecycle, data resides in multiple stages, such as development, test, pre-production and production, and may also reside in multiple clouds (private and public). The topology of MongoDB clusters at each stage is different. For production, the application could be deployed on a sharded MongoDB cluster on-premise but the test team might only have access to unsharded MongoDB clusters in the Amazon AWS (public cloud). Hence, the backup solution should allow multiple restore operations such as sharded-to-sharded (such as from 5×3 cluster to 2×3 sharded cluster) or sharded- to-unsharded (such as 5×3 cluster to 1×3 unsharded) across such cloud configuration.
Reliable Recovery: Datos IO RecoverX is multi-cloud ready and provides a single-click, point-in-time recovery of data directly into the production cluster (source) or to alternate (test/dev) clusters. RecoverX supports all combinations of recovery from sharded-to-sharded or non-sharded clusters, and from on-premise environment to cloud and vice-versa. Thus, it enables admins to refresh test/dev clusters in continuous integration (CI) and continuous development (CD) environments.
Requirement #4: Handling Failure
Failures are a norm in the distributed database world. However, the backup solution should be resilient to database process failures, node failures, network failure, and even logical corruption of data during backup and recovery operations. Finally, the backup solution should be able to handle failures of MongoDB config servers that store metadata for sharded clusters.
Operational Resiliency: Datos IO RecoverX is resilient to a primary or secondary node failure because it captures data (via oplogs) from multiple nodes. Also, RecoverX tracks the config server status for automatic failover of the new config server if there is a failure. This is one of the biggest differentiators and one that helps guarantee consistent backups regardless of failures. Even if multiple nodes fail, versioning and recovery operations continue. There are also multiple checksums that are taken during the backup process to ensure fidelity of data and error-free backup.
Finally, customers are deploying MongoDB are a variety of models such as physical servers, private clouds and microservices like frameworks, and in a public cloud. Backup and recovery should be seamless across these deployment examples, and the ease of backup and recovery deployment is a big one for MongoDB customers. I will cover this in the next blog — stay tuned! At Datos IO, we are working to provide enterprise-grade backup and recovery solutions to enable you to onboard and scale your enterprise applications on MongoDB with confidence.
If you would like to learn more and chat with our team, we will be at MongoDB World 2016 and are sponsoring a free happy hour on June 28 at the Hilton. I look forward to connecting with you all. For more information, please visit here.
Published at DZone with permission of Shalabh Goyal, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Why I Prefer Trunk-Based Development
Using Render Log Streams to Log to Papertrail
The SPACE Framework for Developer Productivity
How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally