This blog is a first in the two-part series of blogs on MongoDB backup and recovery. In this first part, I will discuss the motivations for protecting data that resides on MongoDB and the existing backup and recovery solutions for MongoDB. In the second part, I will discuss how Datos IO is uniquely solving this problem uniquely with its industry-first product.
In the era of big data, enterprise applications create a large volume of data that may be structured, semi-structured or unstructured in nature. In addition, application development cycles are much shorter and application availability is a critical requirement. Given these application requirements, enterprises are forced to look beyond traditional relational databases to onboard the next-generation Platform 3 applications (on IaaS or cloud-based PaaS). NoSQL databases such as MongoDB are now being adopted and evaluated by enterprises for these next-generation applications (eCommerce, content management, etc.). MongoDB provides dynamic schema, easy scaling through auto-sharding, tunable consistency for reads, and built-in replication.
MongoDB database has native replication capability that satisfies the availability requirements. However, data protection requirements for scalable point-in-time backup and recovery need to be addressed. For robust data protection, yes, enterprises need both backup and replication! Without point-in-time backups, organizations are at substantial risk of losing data due to human error, logical corruption and other operational failures. Traditional backup solutions were built to address the requirements of structured Platform 2 applications on relational databases that used shared storage and had the ACID transaction model. Unfortunately, they fall short of addressing the point-in-time backup requirements of Platform 3 applications and distributed databases (local storage, eventual consistency, and the elastic nature of infrastructure). There are a few alternate script-based solutions (e.g. Strata, etc.) that enterprises are using to fill the data protection gap but these solutions are suboptimal at best.
1. Manual Scripted Solutions
These solutions leverage native MongoDB snapshot utility and scripts to transfer data to secondary storage. The scripts (via mongodump) are customized for each MongoDB cluster and require significant operational effort to scale or adapt to any topology changes (such as addition or removal of nodes to your MongoDB database). Further, these scripts are not resilient to failure scenarios e.g. failure of a node (primary or secondary) or intermittent network issues. Finally, recovery (the paramount value of “backup”) is a manual process, hence, time consuming (resulting in very high application downtime) and contains data loss risk due to any bugs in the scripts. Overall, these solutions work when the MongoDB environment is small and some data loss may be permitted in the application. Some of the key issues that these solutions face are:
- Lack of enterprise backup solution for sharded configurations
- Database needs to be offline when the snapshots are taken
- Both backup and recovery fail under node failure and other infrastructure failures
- Recovery process is manual and requires verifications, which increases the recovery time
- Recovery at collection-level requires manual recovery that is time consuming
- Recovery to unlike topologies (sharded → unsharded) for test/dev refresh is not available
Most enterprises that use these scripted methods as a temporary quick-fix solution. It is like driving your car with flat tires — can get you keep going, however neither can you go at the speed you want to go nor are you risk free from disasters.
2. MongoDB Paid Backup and Recovery (aka “MMS”)
MongoDB (the company) itself provides a couple of ways to backup MongoDB databases. Enterprises may choose from either a managed backup offering (MMS) that runs in public cloud or if they are paid MongoDB customers, they may deploy the backup service on-premise. In addition to being exorbitantly costly, the managed backup service stores customers’ data in public cloud. Backup data transfer over WAN may not work for customers who deploy MongoDB on-premise and for the customers who need to keep their sensitive data in-house. Further, there are significant data limitations per shard to use this service.
Using the MongoDB on-premise backup service is possible but is overly complex to deploy and operationalize (deployment diagram speaks for itself!). Enterprises need to deploy 8 servers, additional databases (with additional licensing) and about ~6-9x storage capacity (of the database that is backed up) for enabling on-premise backups. Overall, on-premise backup service is a theoretical solution that brings with it significant CAPEX and OPEX investments:
- Complexity of deploying multiple databases
- Cost of additional infrastructure (servers and storage)
- Cost of licensing additional MongoDB nodes
- Risk of failed backups when nodes fail (secondary from which backup is taken)
- Siloed backup infrastructure for only MongoDB database
Realizing data protection requirements of enterprise customers, the emerging era of next-gen distributed databases (key-value, graph, document repository, etc.), and the limitations of the solutions described above, Datos IO has built industry-first scale-out data protection software product for Platform 3 applications deployed on distributed and cloud databases such as MongoDB and Apache Cassandra (DataStax). The Datos IO solution is built from the ground-up for next-generation applications, caters to the needs of application owners and DevOps, and takes away the operational hassles of deploying and managing protection infrastructure. Most importantly, it is a reliable and scalable solution to use even in scenarios of node failures which leads to optimal performance through minimized recovery time (RTO). I will discuss more about the Datos IO solution in the second part of this blog.