DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. CouchDB Cluster

CouchDB Cluster

Ricky Ho user avatar by
Ricky Ho
·
Apr. 06, 11 · News
Like (0)
Save
Tweet
Share
11.76K Views

Join the DZone community and get the full member experience.

Join For Free
Lets look at how one can layer a cluster on top of CouchDB.

Couch Cluster

A “Couch Cluster” is composed of multiple “partitions”. Each partition is composed of multiple replicated DB instances. We call each replica a “virtual node”, which is basically a DB instance hosted inside a "physical node", which is a CouchDB process running in a machine. “Virtual node” can migrate across machines (which we also call “physical node”) for various reasons, such as …
  • when physical node crashes
  • when more physical nodes are provisioned
  • when the workload of physical nodes are unbalanced
  • when there is a need to reduce latency by migrating closed to the client

Proxy


The "Couch Cluster" is frontend by a "Proxy", which intercept all the client's call and forward it to the corresponding "virtual node". In doing so, the proxy has a "configuration DB" which store the topology and knows how the virtual nodes are distributed across physical nodes. The configuration DB will be updated at more DBs are created or destroyed. Changes of the configuration DB will be replicated among the proxies so each of them will eventually share the same picture of the cluster topology.


In this diagram, it shows a single DB, which is split into 2 partitions (the blue and orange partitions). Each partition has 3 replicas, where one of them is the primary and the other two are secondaries.

Create DB
  1. Client call Proxy with URL=http://proxy/dbname; HTTP_Command = PUT /dbname
  2. Proxy need to determine number of partitions and number of replications is needed, lets say we have 2 partitions and each partition has 3 copies. So there will be 6 virtual nodes. v1-1, v1-2, v1-3, v2-1, v2-2, v2-3.
  3. Proxy also need to determine which virtual node is the primary of its partition. Lets say v1-1, v2-1 are primary and the rest are secondaries.
  4. And then Proxy need to determine which physical node is hosting these virtual nodes. say M1 (v1-1, v2-2), M2 (v1-2, v2-3), M3 (v1-3, v2-1).
  5. Proxy record its decision to the configuration DB
  6. Proxy call M1 with URL=http://M1/dbname_p1; HTTP_Command = PUT /dbname_p1. And then call M1 again with URL=http://M1/dbname_p2; HTTP_Command = PUT /dbname_p2.
  7. Proxy repeat step 6 to M2, M3

List all DBs
  1. Client call Proxy with URL=http://proxy/_all_dbs; HTTP_Command = GET /_all_dbs
  2. Proxy lookup the configuration DB to determine all the DBs
  3. Proxy return to client

Get DB info
  1. Client call Proxy with URL=http://proxy/dbname; HTTP_Command = GET /dbname
  2. Proxy will lookup the configuration DB for all its partitions. For each partition, it locates the virtual node that host the primary copy (v1-1, v2-1). It also identifies the physical node that host these virtual nodes (M1, M3).
  3. For each physical node, say M1, the proxy call it with URL=http://M1/dbname_p1; HTTP_Command = GET /dbname
  4. Proxy do the same to M3
  5. Proxy combine the results of M1, and M3 and then forward to the client

Delete DB
  1. Client call Proxy with URL=http://proxy/dbname; HTTP_Command = DELETE /dbname
  2. Proxy lookup which machines is hosting the clustered DB and find M1, M2, M3.
  3. Proxy call M1 with URL=http://M1/dbname_p1; HTTP_Command = DELETE /dbname_p1. Then Proxy call M1 again with URL=http://M1/dbname_p2; HTTP_Command = DELETE /dbname_p2.
  4. Proxy do the same to M2, M3

Get all documents of a DB
  1. Client call Proxy with URL=http://proxy/dbname/_all_docs; HTTP_Command = GET /dbname/_all_docs
  2. Proxy will lookup the configuration DB for all its partitions. For each partition, it randomly locates the virtual node that host a copy (v1-2, v2-2). It also identifies the physical node that host these virtual nodes (M1, M2).
  3. Proxy call M1 with URL=http://M1/dbname_p1/_all_docs; HTTP_Command = GET /dbname_p1/_all_docs.
  4. Proxy do the same to M2
  5. Proxy combine the results of M1, and M3 and then forward to the client

Create / Update a document
  1. Client call Proxy with URL=http://proxy/dbname/docid; HTTP_Command = PUT /dbname/docid
  2. Proxy will invoke "select_partition(docid)" to determine the partition, and then lookup the primary copy of that partition (e.g. v1-1). It also identifies the physical node (e.g. M1) that host this virtual node.
  3. The proxy call M1 with URL=http://M1/dbname_p1/docid; HTTP_Command = PUT /dbname_p1/docid

Get a document
  1. Client call Proxy with URL=http://proxy/dbname/docid; HTTP_Command = GET /dbname/docid
  2. Proxy will invoke "select_partition(docid)" to determine the partition, and then randomly get a copy of that partition (e.g. v1-3). It also identifies the physical node (e.g. M3) that host this virtual node.
  3. The proxy call M3 with URL=http://M3/dbname_p1/docid; HTTP_Command = GET /dbname_p1/docid

Delete a document
  1. Client call Proxy with URL=http://proxy/dbname/docid?rev=1234; HTTP_Command = DELETE /dbname/docid?rev=1234
  2. Proxy will invoke "select_partition(docid)" to determine the partition, and then lookup the primary copy of that partition (e.g. v1-1). It also identifies the physical node (e.g. M1) that host this virtual node.
  3. The proxy call M1 with URL=http://M1/dbname_p1/docid?rev=1234; HTTP_Command = DELETE /dbname_p1/docid?rev=1234

Create a View design doc
  1. Client call Proxy with URL=http://proxy/dbname/_design/viewid; HTTP_Command = PUT /dbname/_design/viewid
  2. Proxy will determine all the virtual nodes of this DB, and identifies all the physical nodes (e.g. M1, M2, M3) that host these virtual nodes.
  3. The proxy call M1 with URL=http://M1/dbname_p1/_design/viewid; HTTP_Command = PUT /dbname_p1/_design/viewid. Then proxy call M1 again with URL=http://M1/dbname_p2/_design/viewid; HTTP_Command = PUT /dbname_p2/_design/viewid.
  4. Proxy do the same to M2, M3

Query a View
  1. Client call Proxy with URL=http://proxy/dbname/_view/viewid/attrid; HTTP_Command = GET /dbname/_view/viewid/attrid
  2. Proxy will determine all the partitions of "dbname", and for each partition, it randomly get a copy of that partition (e.g. v1-3, v2-2). It also identifies the physical node (e.g. M1, M3) that host these virtual nodes.
  3. The proxy call M1 with URL=http://M1/dbname_p1/_view/viewid/attrid; HTTP_Command = GET /dbname_p1/_view/viewid/attrid
  4. The proxy do the same to M3
  5. The proxy combines the result from M1, M3. If the "attrid" is a map only function, the proxy will just concatenate all the results together. But if the "attrid" has a reduce function defined, then the proxy will invoke the view engine's reduce() function with rereduce = true. Then the proxy return the combined result to the client.

Replication within the Cluster
  1. Periodically, Proxy will replicate the changes of ConfigurationDB among themselves. This will ensure all the proxies are having the same picture of the topology.
  2. Periodically, Proxy will pick a DB, pick one of its partition, and replicate the changes from the primary to all the secondaries. This will make sure all the copies of each partition of DB are in sync.

Client data sync

Lets say the client also has a local DB, which is replicated from the cluster. This is important for occasionally connected scenario, where the client may disconnect with the cluster for a time period and work with the local DB for a while. Later on when the client connects back to the cluster again, the data between the local DB and the cluster need to be synchronized.

To replicate changes from the local DB to the cluster ...
  1. Client start a replicator, and send the POST /_replicate with {source : "http://localhost/localdb, target: "http://proxy/dbname"}
  2. The replicator, which has remembered the last seq_num of the source in the previous replication, and extract all the changes of the localDB since then.
  3. The replicator push these changes to the proxy.
  4. The proxy will examine the list of changes. For each changed document, it will call "select_partition(docid)" to determine the partition, and then lookup the primary copy of that partition and then the physical node that host this virtual node.
  5. The proxy will push this changed document to the physical node. In other words, the primary copy of the cluster will first receive the changes from the localDB. These changes will be replicated to the secondary copies at a latter time.
  6. When complete, the replicator will update the seq_num for the next replication.
To replicate the changes from the cluster to the localDB
  1. Client starts the replicator, which has remembered the last "seq_num" array of the cluster. The seq_num array contains all the seq_num of each virtual node of the cluster. This seq_num array is a opaque data structure which the replicator doesn't care.
  2. The replicator send a request to the proxy to extract the latest changs, along with the seq_num array
  3. The proxy first lookup who is the primary of each partition, and then it extract changes from them using the appropriate seq_num from the seq_num array.
  4. The proxy consolidate all changes from each primary copy of each partition, and send them back to the replicator, along with the updated array of seq_num.
  5. The replicator apply these changes to the localDB, and then update the seq_num array for the next replication.
cluster Partition (database) Data structure Host (Unix)

Published at DZone with permission of Ricky Ho, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Cloud-Based Transportation Management System
  • Handling Automatic ID Generation in PostgreSQL With Node.js and Sequelize
  • How to Cut the Release Inspection Time From 4 Days to 4 Hours
  • Getting Started With JMS-ActiveMQ: Explained in a Simple Way

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: