Today, I thought a lot about how to examine different databases. Choosing a database is often a daunting task. There's a lot of confusion, a 'theorem', and more than all, the immortal proverb 'not one size fits all'. As if it helps.
- Consistency: every read would get you the most recent write
- Availability: every node (if not failed) always executes queries
- Partition-tolerance: even if the connections between nodes are down, the other two (A & C) promises, are kept.
It's really just A vs C!
- Availability is achieved by replicating the data across different machines
- Consistency is achieved by updating several nodes before allowing further reads
- Total partitioning, meaning failure of part of the system is rare. However, we could look at a delay, a latency, of the update between nodes, as a temporary partitioning. It will then cause a temporary decision between A and C:
- On systems that allow reads before updating all the nodes, we will get high Availability
- On systems that lock all the nodes before allowing reads, we will get Consistency
By the way, there's no distributed system that wants to live with "Paritioning" - if it does, it's not distributed. That is why putting SQL in this triangle may lead to confusion.