Kubernetes: Distributed Stateful Apps Using CockroachDB
Kubernetes: Distributed Stateful Apps Using CockroachDB
Kubernetes offers advanced support for databases through StatefulSets. This feature lets you attach a persistent disk to a pod and maintain its connection to the disk even if it gets rescheduled to another physical machine.
Join the DZone community and get the full member experience.Join For Free
Databases are better when they can run themselves. CockroachDB is a SQL database that automates scaling and recovery. Check it out here.
Illustration by Zoë van Dijk
As recently as December 2017, running databases in Kubernetes was challenging — especially for mission-critical online transaction processing (OLTP) workloads whose databases require strong consistency. At that time, rescheduling a database pod (i.e. moving it to a different machine) meant that it lost the disk it was attached to, and that means that the state it was managing disappeared as well.
Naturally, teams still needed to run databases, and they largely solved the problem by managing state outside of Kubernetes. However, this has meant running a single, critical component of your stack outside of Kubernetes, but because its operation is crucial, the database still required a lot of infrastructure to support it, which might include:
- Process monitoring
- Configuration management
- In-datacenter load balancing
- Service discovery
- Monitoring and logging
This is especially painful because all of these functions are duplicative of things already offered in Kubernetes.
As of Kubernetes 1.9 (released in December 2017), Kubernetes offers advanced support for databases through StatefulSets. This feature lets you attach a persistent disk to a pod and maintain its connection to the disk even if it gets rescheduled to another physical machine. This way, as your database pod gets rescheduled, it's capable of maintaining its state.
For more information, check out our blog post Kubernetes: The State of Stateful Apps.
StatefulSets and SQL
While StatefulSets were a huge boon in regards to managing a database, a Kubernetes environment is still difficult for legacy SQL databases to handle. Two major factors that limit their ability to integrate with Kubernetes are:
- Scale: Popular solutions like MySQL and PostgreSQL weren't built to dynamically scale. To get them to work across multiple machines requires complex sharding technology that's appended onto the database and isn't trivial to configure, let alone in a dynamically orchestrated environment.
- Correctness: Replication patterns like PostgreSQL's Write-Ahead Log cannot guarantee correctness unless they're implemented with completely synchronous replication, which increases latency and complicates failover.
A Better SQL Solution: CockroachDB
Rather than run your mission-critical database apart from the rest of your infrastructure or using dated technology ill-suited for the environment, teams now have the option of using a cloud-native SQL database like CockroachDB within Kubernetes.
Leveraging CockroachDB in Kubernetes provides significant upsides including:
- Radically simplifying your stack by removing duplicative technologies and using the same tool everywhere
- Improving developers' experience with a database that provides serializable consistency and whose replicas all behave symmetrically
- Easily scaling clusters to meet demand using CockroachDB's built-in scaling abilities
- Providing resilient services with high availability
How CockroachDB Works on Kubernetes
CockroachDB's origin story has a major parallel to Kubernetes': both have their roots in Google's infrastructure. While CockroachDB is modeled after Google's scalable and consistent database, Spanner, Kubernetes is a direct descendant of Google's orchestration system, Borg. This shared ideological DNA makes it natural that the two would work well together.
Kubernetes' StatefulSets feature was a huge step forward toward simplifying support for stateful services. Using it, database pods that are rescheduled to other nodes are able to "keep" the same remote disk and simply re-attach to it on its new Kubernetes node.
CockroachDB was designed to be a highly-available, fault-tolerant database meant to withstand chaotic deployments, which is powered by the Multi-Active Availability model. This feature lets it accept reads and writes on any CockroachDB node without sacrificing serializable isolation. Through multi-active availability, CockroachDB handles rescheduling gracefully. Moving between Kubernetes nodes is no different from a temporary node outage, which CockroachDB is well equipped to handle.
CockroachDB on Kubernetes Deployment Strategy
To put CockroachDB in Kubernetes, you have two distinct options:
- StatefulSets, which leverage remote persistent volumes for storage and are managed like the rest of your Kubernetes pods (meaning they can easily be rescheduled)
- DaemonSets, which let you leverage a node's local disk, but largely eschew letting Kubernetes manage them (they do not get rescheduled)
Choosing a Deployment Strategy
Like all things, there's a lot of equivocation in choosing between StatefulSets and DaemonSets. The choice ultimately depends on your level of comfort with Kubernetes (e.g. StatefulSets is simpler to implement) and your tolerance for letting Kubernetes completely drive your application (e.g. DaemonSets simply don't let Kubernetes reschedule pods).
For most users, we recommend deploying CockroachDB through StatefulSets; it's straightforward and behaves like all of your other orchestrated services. However, if you are interested in DaemonSets, we have some guidance in our documentation.
StatefulSets Deployment Overview
So, what does it look like to run CockroachDB on Kubernetes through StatefulSets? Here's an overview of what your environment would look like.
- A Kubernetes cluster
- A Kubernetes node for each CockroachDB node you want to run, each running in the same datacenter/availability zone
- We recommend putting each CockroachDB node on a separate machine to optimize fault tolerance. The Kubernetes scheduler prefers doing this anyway and if a machine goes down, you want to minimize your loss of nodes.
- We recommend a single datacenter availability zone when using Kubernetes with CockroachDB. It's possible to deploy CockroachDB on Kubernetes across multiple availability zones, but as of CockroachDB 2.0, it's not recommended for most users because of the complexity in exposing internal network names across Kubernetes clusters.
- A load balancing service for your CockroachDB cluster
- For StatefulSets, you'll also have a persistent volume for each CockroachDB node (at the time of writing, StatefulSet support for local disks is still in beta)
- Monitoring for your Kubernetes cluster through a tool like Prometheus
Getting Everything Up and Running
To deploy CockroachDB on Kubernetes, we have an in-depth guide that covers everything you need.
For those who aren't ready to move something into production, check out our more lightweight Kubernetes tutorial.
Published at DZone with permission of Sean Loiselle , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.