Inside CockroachDB’s Survivability Model

This interview with Marc Berhault takes a look into CockroachDB and its heavy focus on data redundancy. This model allows CockroachDB to survive and thrive even after heavy hits.

Dave Avery

Sep. 21, 16 · Interview

Likes (1)

Comment

Save

5.5K Views

Welcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Marc Berhault, Engineer at Cockroach Labs. His talk will be on Inside CockroachDB’s Survivability Model. This talk takes a deep dive into CockroachDB, a database whose “survive and thrive” model aims to bring the best aspects of Google’s next generation database, Spanner, to the rest of the world via open source.

I had a chance to speak with Marc and learn a bit more about these questions:

Percona: Give me a brief history of yourself: How you got into database development, where you work, what you love about it...

Marc: I started out as a Site Reliability Engineer managing Google’s storage infrastructure (GFS). Back in those days, keeping a cluster up and running mostly meant worrying about the masters.

I then switched to a developer role on Google’s next-generation storage system, which replaced the single write master with sharded metadata handlers. This increased the reliability of the entire system considerably, allowing for machine and network failures. SRE concerns gradually shifted away from machine reliability towards more interesting problems, such as multi-tenancy issues (quotas, provisioning, isolation) and larger scale failures.

After leaving Google, I found myself back in a world where one had to worry about a single machine all over again – at least when running your own infrastructure. I kept hearing the same story: a midsize company starts to grow out of its single-machine database and starts trimming the edges. This means moving tables to other hosts, shrinking schemas, etc., in order to avoid the dreaded “great sharding of the monolithic table,” often accompanied by its friends: cross-shard coordination layer and production complexity.

This was when I joined Cockroach Labs, a newly created startup with the goal of bringing a large-scale, transactional, strongly consistent database to the world at large. After contributing to various aspects of the projects, I switched my focus to production: adding monitoring, working on deployment, and of course rolling out our test clusters.

Percona: Your talk is called “Inside CockroachDB’s Survivability Model.” Define “survivability model”, and why it is important to database environments.

Marc: The survivability model in CockroachDB is centered around data redundancy. By default, all data is replicated three times (this is configurable) and is only considered written if a quorum exists. When a new node holding one of the copies of the data becomes unavailable, a node is picked and given a snapshot of the data.

This redundancy model has been widely used in distributed systems, but rarely with strongly consistent databases. CockroachDB’s approach provides strong consistency as well as transactions across the distributed data. We see this as a critical component of modern databases: allowing scalability while guaranteeing consistency.

Percona: What are the workloads and database environments that are best suited for a CockroachDB deployment? Do you see an expansion of the solution to encompass other scenarios?

Marc: CockroachDB is a beta product and is still in development. We expect to be out of beta by the end of 2016. Ideal workloads are those requiring strong consistency – those applications that manage critical data. However, strong consistency comes at a cost, usually directly proportional to latency between nodes and replication factor. This means that a widely distributed CockroachDB cluster (e.g., across multiple regions) will incur high write latencies, making it unsuitable for high-throughput operations, at least in the near term.

Percona: What is changing in the way businesses use databases that keeps you awake at night? How do you think CockroachDB is addressing those concerns?

Marc: In recent years, more and more businesses have been reaching the limits of what their single-machine databases can handle. This has forced many to implement their own transactional layers on top of disjoint databases, at the cost of longer development time and correctness.

CockroachDB attempts to find a solution to this problem by allowing a strongly consistent, transactional database to scale arbitrarily.

Database CockroachDB

Published at DZone with permission of Dave Avery, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending

Inside CockroachDB’s Survivability Model

This interview with Marc Berhault takes a look into CockroachDB and its heavy focus on data redundancy. This model allows CockroachDB to survive and thrive even after heavy hits.

Related

Partner Resources