How to Run Apache Cassandra on Kubernetes
Explore how to run Apache Cassandra on Kubernetes.
Join the DZone community and get the full member experience.Join For Free
With Kubernetes’ popularity skyrocketing and the adoption of Apache Cassandra growing as a NoSQL database well-suited to matching the high availability and scalability needs of cloud-based applications, it should be no surprise that more developers are looking to run Cassandra databases on Kubernetes. However, many devs are finding that doing so is relatively simple to get going with, but considerably more challenging to execute at a high level.
On the positive side, Kubernetes helpfully offers StatefulSets — workload API objects that can be used to manage stateful applications. StatefulSets provide the requisite components to establish stable and unique network identifiers, stable persistent storage, smooth and ordering deployment and scaling (as well as deletion and termination), and automated rolling updates.
However, the container orchestration platform also has inherent limitations, starting with a lack of sophisticated understanding of how different databases function. For example, Kubernetes is unaware of whether you’re using a leader/follower database, a single database instance, or a multi-sharded leader infrastructure. On top of that, Kubernetes doesn’t understand operationally how these databases scale and what needs to happen in order to scale them. These limitations add to the hurdles developers encounter when trying to get the most out of using Cassandra and Kubernetes together.
This is what we’ve been working on (and now released); an open source Cassandra operator for running and managing Cassandra within Kubernetes. The operator has been built to remove a lot of the challenges that might otherwise discourage developers from utilizing these powerful technologies in tandem. Functionally, this fully open-source operator — which is freely available on GitHub — serves as a Cassandra-as-a-Service on Kubernetes. It is currently ready for use in development and continues to be worked on and improved by our team and partner contributors.
By design, the Cassandra operator handles deployment and operations so that developers can manage and run Cassandra within Kubernetes environments simply and safely, with reduced attention to operations required. By leveraging an operator, you get an environment and set of operations that are consistent and reproducible across different production clusters, as well as development, staging, and QA environments — offering a clear advantage over developer-written scripts for implementing Cassandra on Kubernetes. With best practices built-in and operations taken care of, the operator enables developers to center their attention on product development.
A Kubernetes operator is made up of a controller and a custom resource definition (CRD). The CRD is a Kubernetes concept that makes it possible to define custom objects or resources, leveraged in the Cassandra operator, it allows developers to create Cassandra objects in Kubernetes to represent a single cluster. The controller then listens for state changes on the CRD and manages StatefulSets to meet those parameters.
By this method, developers can control how Kubernetes deploys Cassandra, defining configuration options including cluster name, isolation to a specific Kubernetes namespace, node count, persistent volumes to use, JVM tuning options, and more. The controller will also enact specifications made through the CRD to manage operations as well as repairs, backups, and issue-free scaling. In this way, it utilizes the key Kubernetes concept of building controllers upon each other to arrive at intelligent and useful functionalities.
From an architectural standpoint, the Cassandra controller interfaces with the Kubernetes master, where it listens for state changes and adapts pod definitions and CRDs accordingly. The controller then deploys these alterations, waits for the changes to take effect, and repeats as needed until those changes are fully in place.
For an example of how the Cassandra controller performs operations in the Cassandra cluster, let’s examine the process of scaling down a cluster by removing one node. Rather than altering the StatefulSet to do this, the listening controller will recognize the CRD change to a lower node count. It will then run a decommission operation on the Cassandra node first to ensure data has been moved off the node. This process makes sure that the node is stopped in an elegant manner so that the data it contains is carefully redistributed and balanced across the other nodes. When the controller confirms this has occurred, it then changes the StatefulSet definition so that Kubernetes decommissions the pod. By doing so, the Cassandra controller ensures that Cassandra operations are carried out smoothly and correctly.
Ultimately, the Cassandra operator is intended to equip developers with plenty of capable open-source options for utilizing Cassandra on Kubernetes much more easily than has thus far been possible.
Ben Bromhead is the CTO at Instaclustr, which provides a managed service platform of open source technologies such as Apache Cassandra, Apache Spark, Elasticsearch and Apache Kafka.
Opinions expressed by DZone contributors are their own.