DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Securing Your Software Supply Chain with JFrog and Azure
Register Today

Trending

  • Why I Prefer Trunk-Based Development
  • Using Render Log Streams to Log to Papertrail
  • The SPACE Framework for Developer Productivity
  • How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally

Trending

  • Why I Prefer Trunk-Based Development
  • Using Render Log Streams to Log to Papertrail
  • The SPACE Framework for Developer Productivity
  • How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally
  1. DZone
  2. Data Engineering
  3. Databases
  4. Backup and Recovery of MongoDB With RecoverX (Part 2)

Backup and Recovery of MongoDB With RecoverX (Part 2)

Datos IO's Shalabh Goyal covers some concerns MongoDB users have and how Datos IO's RecoverX can lend a hand.

Shalabh Goyal user avatar by
Shalabh Goyal
·
Jun. 27, 16 · Opinion
Like (3)
Save
Tweet
Share
2.52K Views

Join the DZone community and get the full member experience.

Join For Free

This blog is the second in a two-part series on backup and recovery for MongoDB. In one of my previous blogs, I covered why companies require both replication and backup for enterprise-grade data protection. And in the first part of this blog series, I discussed the existing solutions for backup and recovery of MongoDB and their drawbacks. Now in this blog, I will discuss the key requirements for protecting data that resides on MongoDB (deployed either on-premise, on a private cloud with an as-a-service model, or in a public cloud with Amazon AWS, Google Cloud Platform, etc.) and the innovative features of Datos IO RecoverX, cloud-scale data protection software.

Requirement #1: Online Cluster-Consistent Backups

One of the key requirements of next-generation applications that are deployed on MongoDB is the always-on nature. This means that quiescing the database for taking backups is not feasible and moreover, the backup operation should not impact the performance of the application. As the application scales, the underlying MongoDB also needs to scale-out to multiple shards. In this case, a backup solution must provide a consistent backup copy across shards without disrupting database and application performance during backup operations.

Scalable Versioning: Datos IO RecoverX versioning operation creates a true point-in-time backup copy of MongoDB databases that is consistent across shards. During the initial full copy, RecoverX transfers data from the primary nodes of each shard. For all subsequent backups, the oplogs from all nodes (primary and secondary) are tracked and transferred to secondary storage. RecoverX uses the oplogs to create a consistent version of MongoDB database across all shards without quiescing the database. This also allows customers to have a highly durable backup in the event that the primary node fails.

Advanced Data Management ServicesRequirement #2: Flexible Backup Options

Depending on the application, data may have different change rate and patterns. For example, in a product catalog, certain items may be refreshed every day (fast-selling goods), while the others may have a longer shelf life (premium items). Based on the application requirements, some collections may need to be backed up every hour versus the others that may be backed up daily. Providing this flexibility to schedule backups at any interval and at collection level granularity is another requirement that we have heard from customers who are using MongoDB. More importantly, these backups should always be stored on the secondary storage in native formats to avoid vendor lock-in.

Native Storage Format and Semantic Deduplication: RecoverX provides collection-level backup granularity so admins may define the backup interval based on the RPO requirements. Further, the backup data is stored in the native BSON format. Therefore, vendor lock-in is avoided. More importantly, native formats allow advanced data management services such as queryable versions, search of backup data sets, ability to run applications directly on secondary data, et al. Finally, RecoverX brings semantic de-duplication, an industry-first feature, that helps reduce the cost of storing backups. As a part of versioning, Datos IO RecoverX makes sure that the backup copy has no replicas of primary data set, thus providing de-duplication of source data — all without losing native formats. Moreover, after the first full version, all subsequent versions are incremental in nature where only the delta changes are transferred. This results in more than ~70% savings on secondary storage costs.

Requirement #3: Scalable Recovery

During its lifecycle, data resides in multiple stages, such as development, test, pre-production and production, and may also reside in multiple clouds (private and public). The topology of MongoDB clusters at each stage is different. For production, the application could be deployed on a sharded MongoDB cluster on-premise but the test team might only have access to unsharded MongoDB clusters in the Amazon AWS (public cloud). Hence, the backup solution should allow multiple restore operations such as sharded-to-sharded (such as from 5×3 cluster to 2×3 sharded cluster) or sharded- to-unsharded (such as 5×3 cluster to 1×3 unsharded) across such cloud configuration.


chartReliable Recovery: Datos IO RecoverX is multi-cloud ready and provides a single-click, point-in-time recovery of data directly into the production cluster (source) or to alternate (test/dev) clusters. RecoverX supports all combinations of recovery from sharded-to-sharded or non-sharded clusters, and from on-premise environment to cloud and vice-versa. Thus, it enables admins to refresh test/dev clusters in continuous integration (CI) and continuous development (CD) environments.

Requirement #4: Handling Failure

Failures are a norm in the distributed database world. However, the backup solution should be resilient to database process failures, node failures, network failure, and even logical corruption of data during backup and recovery operations. Finally, the backup solution should be able to handle failures of MongoDB config servers that store metadata for sharded clusters.

Operational Resiliency: Datos IO RecoverX is resilient to a primary or secondary node failure because it captures data (via oplogs) from multiple nodes. Also, RecoverX tracks the config server status for  automatic failover of the new config server if there is a failure. This is one of the biggest differentiators and one that helps guarantee consistent backups regardless of failures. Even if multiple nodes fail, versioning and recovery operations continue. There are also multiple checksums that are taken during the backup process to ensure fidelity of data and error-free backup.

Finally, customers are deploying MongoDB are a variety of models such as physical servers, private clouds and microservices like frameworks, and in a public cloud.  Backup and recovery should be seamless across these deployment examples, and the ease of backup and recovery deployment is a big one for MongoDB customers. I will cover this in the next blog — stay tuned! At Datos IO, we are working to provide enterprise-grade backup and recovery solutions to enable you to onboard and scale your enterprise applications on MongoDB with confidence.

If you would like to learn more and chat with our team, we will be at MongoDB World 2016 and are sponsoring a free happy hour on June 28 at the Hilton. I look forward to connecting with you all. For more information, please visit here.

Backup MongoDB cluster Data (computing) application Requirement Cloud Database

Published at DZone with permission of Shalabh Goyal, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • Why I Prefer Trunk-Based Development
  • Using Render Log Streams to Log to Papertrail
  • The SPACE Framework for Developer Productivity
  • How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: