RavenDB 3.5 Whirlwind Tour: A Large Cluster Goes Into a Bar and Orders N^2 Drinks
Or, every node in the cluster orders a drink for each of the other nodes, loudly, at the same time.
Join the DZone community and get the full member experience.Join For Free
Imagine that you have a two nodes cluster set up as master-master replication, and then you write a document to one of them. The node you wrote the document to now contacts the 2nd node to let it know about the new document. The data is replicated, and everything is happy in this world.
But now, let's imagine it with three nodes. We write to node 1, which will then replicate to nodes 2 and 3. But node 2 is also configured to replicate to node 3, and given that we have a new document in, it will do just that. Node 3 will detect that it already has this document and turn that into a no-op. But, at the same time that node 3 is getting the document from node 2, it is also sending the document it got from node 1 to node 2.
This works, and it evens out eventually because replicating a document that was already replicated is safe to do. And on high load systems, replication is batched, so you typically don't see a problem until you get to bigger cluster sizes.
Let us take the following six-way cluster. In this graph, we are going to have 15 round trips across the network on each replication batch.*
* Nitpicker corner, yes, I know that the number of connections is ( N * (N-1) ) / 2, but N^2 looks better in the title.
The typical answer we have for this is to change the topology, instead of having a fully connected graph, with each node talking to all other nodes, we use something like this:
Real topologies typically have more than a single path, and it isn't a hierarchy, but this is to illustrate a point.
This work, but it requires the operations team to plan ahead when they deploy, and if you didn't allow for breakage, a single node going down can disconnect large portion of your cluster. That is not ideal.
So in RavenDB 3.5 we have taken steps to avoid it... nodes are now much smarter in the way they go about talking about their changes. Instead of getting all fired up and starting to send replication messages all over the place—potentially putting some serious pressure on the network—the nodes will be smarter about it, and wait a bit to see if their siblings already got the documents from the same source. In which case, we now only need to ping them periodically to ensure that they are still connected, and we saved a whole bunch of bandwidth.
Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.