A Storage Hack for Bringing Stateful Apps to Kubernetes: Data That Follows Applications
If we can figure out how to circumvent the data issue, moving stateful applications to Kubernetes has many decisive benefits.
Join the DZone community and get the full member experience.Join For Free
Kubernetes, the open-source container orchestration system created by Google, is one of the most adopted technologies of the last decade. It is clear everyone loves this open-source platform, as the double-digit growth in adoption rate clearly demonstrates.
In fact, the Cloud Native Computing Foundation (CNCF) found that in 2019 84% ran Kubernetes containers in production, double from two years prior. This growth in adoption is unlikely to stop any time soon, seeing how Kubernetes is an efficient way to manage containers at scale, which translates into lower costs and increased cloud flexibility.
Containerization is all about encapsulating and packaging software code and all its dependencies so that it can run uniformly and consistently on any infrastructure. In recent years, it has become the go-to technique to break down monolithic applications and deploy applications and microservices. However, moving to containers can be a tricky and time-consuming task. This is where Kubernetes comes in.
Kubernetes helps you deploy more efficiently at scale and derive containerized apps to automatically run regardless of the environment they utilize, allocating resources to suit demand.
Stateless Applications Enjoy the True Benefits of Kubernetes, Stateful Apps Are a Different Story
The portability of stateless applications gives them the ability to run anywhere, but not all applications are stateless; most applications are dependent on data. And data does not follow the same rulebook as stateless applications.
Data bind the application to its storage locations. In a way, the physical location becomes the app dependency.
That being said, this does not mean that containerized stateful applications are not worth it. If we can figure out how to circumvent the data issue, moving stateful applications to Kubernetes has many decisive benefits, including:
- Application scaling and upgrades: Autoscaling will add newly available nods to the cluster and deploy updates.
- Improved availability: Improved fault isolation comes naturally with distributed environments. Failure in the pod or cluster layer will lead to rescheduling a pod to take its place with no impact on service.
- Accelerated development: Breaking down applications into a manageable set of stateful services that are independent, yet can communicate with one another, giving you the ability to make changes and updates without affecting the entire business, accelerating releases and time to market.
- Avoiding vendor lock-in: Avoiding the use of managed data services that lock you in with your cloud provider and bind you to a specific region.
In a nutshell, Kubernetes provides the same benefits you would expect from a stateless cloud-native containerized application, but for stateful apps. So, if we can figure a way to instantly move the data to where the application is, we are golden.
So, now what we need to figure out is how to connect the containerized apps to the storage and direct the application to use it.
Storage? It’s Complicated
Kubernetes offers an instant self-healing application, giving you that, "I don't care where it runs, as long as it does" experience that Kubernetes is known for. Kubernetes restarts containers that fail, replaces those containers, kills the containers that don't respond to your user-defined health check, and doesn't advertise them to clients until they are ready to serve.
On the other hand, storage is hard. It requires all the tools you thought you could avoid by moving to the public cloud: disaster recovery (DR), replication configurations, data portability, networking. All the complexities you offloaded to the cloud for are now back to haunt you under a different name: "Stateful Containerized Applications."
With all the headaches that come with storage, unfortunately, there is no way of getting around it. In multi-cluster scenarios, you need to keep the state synced across data locations, so you settle for convoluted, complex, and expensive storage architectures. (Sigh.)
The Difference Between Stateful and Stateless K8s Experience
Many solutions aim to give your stateful Kubernetes applications persistent local storage. Most solutions are taking the Software-Defined Storage (SDS) approach. They add an abstraction layer that communicates between the cluster and its pods to its dedicated storage. But, the data remains static and bound to a physical location, missing out on most advantages K8s are known for.
The Need for Portable and Synchronized Container Persistent Storage
I believe the solution is pretty simple on a conceptual level. Instead of forcing the application to run where the data happened to be originally provisioned, data needs to follow the application.
Although it sounds simple, it’s easier said than done. To make your data follow the applications, it needs to be synchronized between the locations where you'd potentially want to run the applications. It also needs to have guaranteed resiliency and stay up-to-date.
Opinions expressed by DZone contributors are their own.