This is my second topic in this series. In this post, we will discuss the building blocks/algorithms on which most NoSQL databases are based on, showcasing their differences in terms of scalability, fault-tolerance, and performance.
Sharding or Horizontal Scaling — Bucketing of Big Data
Sharding is the way to bucket data when necessary. The data should be scaled horizontally so that "scaling out" is possible. The data can be partitioned horizontally across multiple database servers. A possible example is that whenever any request comes from the website to show a US-based customer's details, the data will be retrieved from a different server than when the request comes from a non-US customer. This helps to reduce the load on any particular server in the NoSQL space. In traditional database servers (RDBMS, in general), sharding or partitioning the data is not an easy task (you may need to do lots of juggling for setup and configuration) whereas most of the NoSQL servers are designed in this way only.
As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. - MongoDB
The other point is that sharding doesn't come with free benefits. It has other implications as well. So, it is very important that replication and load-balancing happen in the right way. This way if one node/server fails, the data gets picked up from another node which maintains the same data. MongoDB partitions data based on a shard key. Redis supports different partitioning algorithms, one of which is consistent hashing which is being applied by a few Redis clients.
Quorum — Data Replication
Quorum is the minimum number of votes that a distributed transaction needs in order to allow it to perform an operation in the distributed environment.
A quorum is the minimum number of votes that a distributed transaction has to obtain in order to be allowed to perform an operation in a distributed system. A quorum-based technique is implemented to enforce consistent operation in a distributed system. - Wikipedia
It is used for distributed commit and replication.
GossipProtocol — Co-ordination, Consistency, and Detection of Failures
It's like people gossiping—trying to understand who your peers and neighbors are, and then exchanging information amongst them. In a typical load balancing scenario, the load balancer sends heartbeat information in a regular interval to the webserver(s) to find if any particular node is alive or not. In this case, if the response doesn't come back within a specified time/attempt, the node is being removed from the load balancer table. Similarly in the NoSQL world, it is very important to keep constant communications amongst nodes in distributed environments. Here data may be sharded among multiple nodes or there can be master-master or master-slave replications. In all cases, co-ordination is required to keep consistency of data as well as detect failures.
The Gossip protocol servers this purpose. Here peer nodes keep gossiping amongst each other and share information in a continuous basis. Redis uses a technique called ping-pong to keep the communication going amongst multiple distributed nodes.
Gossip protocols have proven to be effective means by which failures can be detected in large, distributed systems in an asynchronous manner without the limitations associated with reliable multicasting for group communications
Read Repair — Data Repair
Cassandra uses a concept called "Read Repair" to keep the consistency of read operations for requested rows across all the replica nodes. It uses two types of read request—direct read and background read requests. The direct read can be configured through read consistency level; the background read request goes to all the other nodes which didn't receive a direct read request. It is an optional feature and can be configured at the column level.
HintedHandoff — Co-ordination, Consistency, and Detection of Failures
HintedHandoff is a Cassandra feature that optimizes the cluster consistency process and anti-entropy when a replica-owning node is not available, due to network issues or other problems, to accept a replica from a successful write operation.
It is required to keep consistency in the cluster to ensure that when a node is not available, the local hint table keeps the data and then writes back when the node becomes available. HintedHandoff is used to handle transient failures.
VectorClocks — Conflict detection
Dynamo (Amazon ), Voldemort (LinkedIn) uses vector clocks to identify the order of multi-version records in the system. Suppose Amit, Ajit, Atul, and Ankush are meeting at some place. They initially decided the meet-up on Tuesday. Later, Amit and Ajit discussed to chang it to Wednesday. Similarly, Ankush and Atul decided to have the meeting on Friday. Ajit and Ankush had another exchange that it may happen on Thursday. Now, all have different versions and conflicts arises. All of them want to clarify the date, but all are not reachable at the same time. How can they resolve this conflict?
Consistent Hashing — Rebalancing
Consistent hashing is an algorithm to help sharding data based on keys. It finds the node in a cluster where a particular piece of data can be stored.
Rebalancing is the process to shuffle and locate data/documents among nodes in a cluster when new servers are being added or existing ones are being removed. Consistent Hashing helps to generate keys based on which this relocation happens.
MerkleTree — Deleting Inconsistency
Dynamo, Riak, and Cassandra use this algorithm to minimize the amount of data to be synchronized in the case of any inconsistency.
Multi-version Concurrency Control — Consistency and Resolution of Conflicts
CouchDB uses Multi-Version Concurrency Control (MVCC) to avoid locking the database file during writes. Conflicts are left to the application to resolve at the right time. Older document versions (called revisions) may be lost when the append-only database file is compacted.
LMT (Log-Structured Merge Tree)
LMT provides the techniques to store data and can be used to overcome some of the pitfalls of B-tree. LevelDB is probably the most popular implementation. Apart from LevelDB, there are other products like Cassandra or HBase the also uses this. LMT uses immutable storage segments. It also facilitates write before read as well as reduces fragmentation, or possibly replaces B-Tree "write cliff". For more information, please check this link: http://www.xaprb.com/blog/2015/04/02/state-of-the-storage-engine/