Learning CouchDB showed me that it is different than other NoSQL DBs I had studied so far with respect to sharding and replication:
- It does not have sharding, but rather each server has all the data.
- It has multiple-master (or active-active) configuration, where you can write to any server
- Revision numbers: first, revision numbers start with the change number to the record. Example: 2-de0ea16f8621cbac506d23a0fbbde08a. Therefore, latest changes to any record win (change 2-de0ea16f8621cbac506d23a0fbbde08a wins over 1-7c971bb974251ae8541b8fe045964219).
- MD5: after the change number, there is change MD5. In case concurrent changes to the record occur across two different servers, there's an ASCII comparison to determine which one wins (change 2-de0ea16f8621cbac506d23a0fbbde08a wins over 2-7c971bb974251ae8541b8fe045964219). Of course there's no concept of time here to determine which change happened first, but at least each server is able to pick a winning version deterministically. Using a MD5 hash of the document can be quite helpful to avoid replicating the same change (if the same change was made to the different CouchDB databases).
- Conflict List: besides picking automatically a winning list, any client can know when there were conflicts and do something about - either merging them or picking a different version. That can be done, for instance, using a view that outputs conflict and subscribing to it through the Change API. I understand that this is optional and doesn't stop the system in case of conflicts.
For systems where concurrent updates are not common, this is definitely a valid and good approach. Also, there is conflict avoidance through ETAGs (just like Windows Azure Storage), which goes a long way to avoid conflicts.
My concern with CouchDB and active-active is a prolonged split-brain situation, where 2 or more databases are taking writes to the records (potentially the same). Clients can see inconsistent results if they are hitting these different deployments.
Going a bit further on the Split-Brain scenario, if these databases are used by other downstream components in your system, the same conflict resolution problem may occur on their side, as they could be consuming records from these different databases (that are not resolving conflicts for some time). Reason about all these scenarios can be challenging.
The interesting thing about this is that, if indeed there's a split-brain but clients can talk to all deployments, CouchDB could take write and return an error in case replication is currently not working, so clients could have an embedded logic to try to connect to other CouchDB deployments and do some of this replication, which will keep all the deployment in a sort of consistent state.