Design Limitations in Open Source Distributed Databases, Queues, and Lock Services
"Hey, I just met you... Our network's crazy... Our conn might drop now... So call me maybe" -- The Jepsen project on GitHub.
Get some amazingly rare insights about the design limitations of various distributed systems like Kafka, NuoDB, Cassandra, and Zookeeper.
From the presentation description for the Strangeloop presentation.
Distributed databases, queues, and lock services vary in their durability, availability, and consistency guarantees under partition. In particular, designers and developers often assume that system clocks are monotonic, advance at the same rate, or are synchronized between nodes; or that network partitions are impossible, separate the system cleanly into disjoint components, or are stable over short timescales.
How a system tracks causality and reconciles divergent state determines how well it behaves under unstable conditions. I’ve spent the last six months subjecting popular distributed systems to different kinds of network partitions while under load. Discovering their design limits and bugs illustrates how difficult it is to build reliable distributed services in practice.