DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. Tryting to Understand CAP

Tryting to Understand CAP

Adam Warski user avatar by
Adam Warski
·
Jan. 13, 13 · Interview
Like (1)
Save
Tweet
Share
4.06K Views

Join the DZone community and get the full member experience.

Join For Free

The CAP theorem, stated by Brewer and proved by Gilbert and Lynch specifies a property of distributed systems. It states that such a system cannot guarantee at the same time Consistency, Availability and Partition tolerance. It is also often said as a catchy phrase:

Consistency, Availability, Partition Tolerance – pick any two

used mostly when talking about NoSQL databases and suggesting that a distributed system can be characterized as either CA, AP or CP (see e.g. here).

For some time I’ve been trying to understand the different combinations and what do they mean in practice; having some time at various airports I caught up on reading, and here’s what I came up with.

What’s C, A and P?

First let’s define how I understand the three guarantees, basing on some of the articles I’ve read.

Consistency is the easiest one. It roughly means that the clients get the same view of data. By saying that a system is consistent we often mean strong consistency, but it also can come in different flavors, e.g. casual.

Availability is a property saying that every request to a non-failing node will return a (meaningful) response. The response may not contain all data (so the harvest will not be 100%, see the appropriate section in [3]), but it should be useful for the client.

Partition tolerance means that the system will continue working even if any number of messages sent between nodes is lost. This can be e.g. a network failure between two datacenters, where nodes in each datacenter form a partition. Also note that a failure of any number of nodes forms a partition (it is not possible to distinguish between a network failure and a node failing and stopping to respond to messages).

The hardest part for me is understanding the difference between Availability and Partition tolerance. Also, the various articles don’t specify what they mean by saying that a system is “working” after being e.g. partitioned – does it mean that every request gets a response with useful data, or are responses “Sorry, I can’t give you data right now” acceptable also?

P+C?

Let’s assume that a system is partition tolerant and that it has more than one node. If a partition is formed, splitting the system in two, the system should continue working. Hence both partitions allow clients to write. But then, how to guarantee consistency? If one client writes to partition 1, and another to partition 2? Hence: P => ~C.

A+~P?

Suppose now that a system is available and that we have more than one node. As the system is available, it should respond to requests even if some nodes die. As noted above, some nodes dying are equivalent to a partition. So if the system is still working, we have partition tolerance. Hence: A => P.

A+C?

Summarizing the two implications above (A => P and P => ~C), we get: A => ~C, so that an available system cannot be consistent (if it has more than one node). In practice however, there are of course AC systems, e.g. single-node RDBMS. Or even master-slave/master-master replicated RDBMS, provided there’s a central router knowing which nodes live and directing client appropriately. Such a router is then a single point of failure (SPoF).

Relax?

I suspect that in reality, when e.g. NoSQL/NewSQL systems are characterized with the CAP properties, they assume some relaxed form of C/A/P. Unfortunately, all of the definitions flying around seem to be pretty vague and are more of hand-waving than proper, precise statements. I think it would be much easier to explore the ever-growing ecosystem of new datastores if they could be more easily characterized; maybe the CAP vocabulary is just not enough?

Please correct me if I’m wrong somewhere, and I probably am! :)

Adam

Some articles I used for my research:

[1] http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
[2] http://www.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/
[3] http://codahale.com/you-cant-sacrifice-partition-tolerance/

Partition (database)

Published at DZone with permission of Adam Warski, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Mr. Over, the Engineer [Comic]
  • The Quest for REST
  • Upgrade Guide To Spring Data Elasticsearch 5.0
  • How to Create a Real-Time Scalable Streaming App Using Apache NiFi, Apache Pulsar, and Apache Flink SQL

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: