DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. CouchDB and Conflict Resolution

CouchDB and Conflict Resolution

Rodrigo De Castro user avatar by
Rodrigo De Castro
·
Sep. 28, 12 · Interview
Like (0)
Save
Tweet
Share
7.42K Views

Join the DZone community and get the full member experience.

Join For Free

Learning CouchDB showed me that it is different than other NoSQL DBs I had studied so far with respect to sharding and replication:

  • It does not have sharding, but rather each server has all the data.
  • It has multiple-master (or active-active) configuration, where you can write to any server
The interesting part about the active-active is how conflicts are resolved. Rather than just showing that there's a conflict, all CouchDB servers have a consistent view of the world, even after conflicts. How does it do it?
  • Revision numbers: first, revision numbers start with the change number to the record. Example: 2-de0ea16f8621cbac506d23a0fbbde08a. Therefore, latest changes to any record win (change 2-de0ea16f8621cbac506d23a0fbbde08a wins over 1-7c971bb974251ae8541b8fe045964219).
  • MD5: after the change number, there is change MD5. In case concurrent changes to the record occur across two different servers, there's an ASCII comparison to determine which one wins (change 2-de0ea16f8621cbac506d23a0fbbde08a wins over 2-7c971bb974251ae8541b8fe045964219). Of course there's no concept of time here to determine which change happened first, but at least each server is able to pick a winning version deterministically. Using a MD5 hash of the document can be quite helpful to avoid replicating the same change (if the same change was made to the different CouchDB databases).
  • Conflict List: besides picking automatically a winning list, any client can know when there were conflicts and do something about - either merging them or picking a different version. That can be done, for instance, using a view that outputs conflict and subscribing to it through the Change API. I understand that this is optional and doesn't stop the system in case of conflicts.

    Conflict Resolution on Couch

For systems where concurrent updates are not common, this is definitely a valid and good approach. Also, there is conflict avoidance through ETAGs (just like Windows Azure Storage), which goes a long way to avoid conflicts.

My concern with CouchDB and active-active is a prolonged split-brain situation, where 2 or more databases are taking writes to the records (potentially the same). Clients can see inconsistent results if they are hitting these different deployments.

Going a bit further on the Split-Brain scenario, if these databases are used by other downstream components in your system, the same conflict resolution problem may occur on their side, as they could be consuming records from these different databases (that are not resolving conflicts for some time). Reason about all these scenarios can be challenging.

The interesting thing about this is that, if indeed there's a split-brain but clients can talk to all deployments, CouchDB could take write and return an error in case replication is currently not working, so clients could have an embedded logic to try to connect to other CouchDB deployments and do some of this replication, which will keep all the deployment in a sort of consistent state.

Some references:
http://wiki.apache.org/couchdb/Replication_and_conflicts
http://guide.couchdb.org/draft/conflicts.html
http://www.quora.com/CouchDB/How-does-CouchDB-solve-conflicts-in-a-deterministic-way

 

 

Database

Published at DZone with permission of Rodrigo De Castro, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • ChatGPT: The Unexpected API Test Automation Help
  • Fraud Detection With Apache Kafka, KSQL, and Apache Flink
  • Top Three Docker Alternatives To Consider
  • Select ChatGPT From SQL? You Bet!

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: