It’s not a horde of zombies that I fear the most, a network partition. It’s not even a zombie hidden behind a closed door. It’s the thought of someone in my group becoming a zombie that I fear the most, an unresponsive node. The truth is, distributed systems such as NoSQL databases are terrified of unresponsive nodes.
If someone in my group becomes unresponsive, should I leave them behind or I should I wait for them to become responsive again? What if I leave them behind? It will affect the remaining people in my group. What if I wait? The unresponsive person may or may not become responsive again. They might become a zombie.
I would wait for them to become responsive again, and so should a NoSQL database. I believe that NoSQL databases should operate with the assumption that unresponsive nodes will become responsive again. If a node cannot become responsive, it can be restarted.
The problem with assuming that an unresponsive node is lost, is that it affects the remaining nodes.
The remaining nodes will become responsible for read and write requests for the data owned by the unresponsive node. If I leave someone behind, then one or more people in my group will have to carry his or her things. This will be a problem if my group is running away from a hoard of zombies.
What happens if someone gets injured from carrying more things and I leave them behind? It increases the likelihood that another person will get injured from carrying even more things. Have you ever watched a distributed system disintegrate right before your eyes? I have. It’s not a pleasant experience.
Rehash / Rebalance
The remaining nodes will have to rehash and rebalance the data. If I leave someone behind, then my group has to figure out who is going to carry what now that it has one less person. This will include people in the group swapping things to carry. This takes time. This will be a problem if my group is running away from a hoard of zombies.
What if it becomes a habit? What happens if I continue to leave people behind? They may become responsive again and form their own group. What happens if the two groups meet? I watch The Walking Dead (link). It never turns out well. Who is going to be the leader? Who gets to keep their things?
I’m like Rick (link). I would not leave anyone behind unless I had to. If you are like me, Couchbase Server is the NoSQL database for you. With Couchbase Server, both automatic failover and rehashing / rebalancing are disabled by default. If there is a network partition, it will not suffer from split-brain syndrome. If there are unresponsive nodes, it will not stop while it rehashes and rebalances the data.
However, if you plan to go all Shane (link) on your group then Couchbase Server is the NoSQL database for you too. You can enable automatic failover. You can rehash and rebalance the data.
You should have a choice.