MongoDB Approach to Availability
MongoDB Approach to Availability
Join the DZone community and get the full member experience.Join For Free
Databases are better when they can run themselves. CockroachDB is a SQL database that automates scaling and recovery. Check it out here.
Another thing I find interesting about MongoDB is its approach to Durability, Data Consistency and Availability. It is very relaxed and will not work for some applications but for others it can be usable in current form. Let me explain some concepts and compare it to technologies in MySQL space.
First I think MongoDB is best compared no to MySQL Server but MySQL Cluster, especially in newer versions which implement “sharding”. Same as commit to NDB Storage engine does not normally mean commit to disk, but rather commit to network it does not mean commit to disk with MongoDB, furthermore MongoDB uses Asynchronous replication, meaning it may take some time before data will be at more than one node. You can also use getLastError() to ensure data is propagated to the slave. So you can see it as a hybrid between MySQL Cluster and innodb_flush_log_at_trx_commit=2 mode. The second difference of course the fact MongoDB is not crash safe – similar to MyISAM database will need to be repaired if it crashes. Still I find behavior somewhat similar – you’re not expected to run MySQL Cluster without replication, MongoDB is practically the same.
Second – if we look at Replication Sets we find them very similar to MySQL Cluster though designed to work with Wide area network and so Async replication. There is voting required to pick the master node in case of node failure and at least 3 servers is recommended, where you can have some voting servers only cast their votes and hold no data. The other different is there is only one master rather than multiple. This is because doing master with asynchronous replication requires conflict resolution which can be tricky in general sense and MongoDB wants simplicity of operation for developers and administration.
Third if we look at how failover happens – same with NDB (native API) it is handled on driver level. When you connect to replication set you connect to set of server not one of them and if one server fails driver fails over to different master. Things are again tuned to deal with Asynchronous Replication. Consistency is maintained but at expense of certain changes may be thrown away/ “rolled back” in case of fail over.
This approach is not as clean as best possible “no committed data loss with almost instant fail over” but It makes sense for large number of applications. In fact using MySQL Replication for failover we’re operating with kind of similar situation, just with a lot less automation.
The good question of course is how robust these features are in MongoDB – many of them are new and Replication Sets are in development still. It may take a time for them to stabilize as well as later develop tools around them. How to check if 2 MongoDB nodes are indeed in sync ? How to do Hot Backups with point in time recovery ? These and many similar questions need to be answered and bugs worked out. One good example of early stage of MongoDB replication could be a bug mentioned during presentation today with replication breaking if time on master server is changed (MongoDB uses timestamps to identify events in replication log). It was just fixed last month I understood.
At the same time many things, including replication are a lot more simply with MongoDB and there is a lot less of old baggage so I hope it will be able to stabilize and mature quickly.
Published at DZone with permission of Peter Zaitsev , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.