Kubernetes Operators to Realize the Dream of Zero-Touch Ops
Kubernetes Operators to Realize the Dream of Zero-Touch Ops
Kubernetes Operators has the power to realize the dream of Zero-touch Ops, bringing in AIOps to life…and this is how I believe it will.
Join the DZone community and get the full member experience.Join For Free
As we step into MicroServices architectures and ways to deploy these on the cloud with containers, and all the goodness of DevOps. The application functionality grows, the clusters and the number of resources in the cluster also grows, if the application is not “built-for-manage”, it's going to be a nightmare to manage these applications, and we might end up spending more effort in managing these applications, than building them…ironically! While the world of automation technology has huge promise, and we are talking about zero-touch ops as nirvana for managing cloud applications!
According to me, Operators is the most important architectural component in the k8s world, which has a huge promise to carry us towards our zero-touch (or low-touch) ops journey.
Before I jump in…let me quickly walk you through my understanding of operators (and I am sure there are a lot of blogs, vblogs, youtube videos, which might do a better job.)
K8s Is all About Controllers and Resources.
In the initial versions of the k8s, it came with defined resources, and we were only restricted to use those resources that came along with the k8s.
Controllers are very good at managing stateless applications, as it's like a constant control loop to track and fix. Since applications are stateless, there is no backup/recovery/restore of state. for-example if an instance of webserver crashes, the controller can easily be replaced by that with another instance of the webserver and bring it back to the desired state.
But for stateful applications like databases, it’s not that straight forward, and it will require manual intervention to restore the state! So we need something more than standard controllers.
Since the introduction of the Custom Resources, we have the flexibility to declare and create our own k8s resources.
Now imagine if we can start defining our own resources and letting the k8s also manage them! Even better, imagine if we can build our own controllers to have our own custom manage logic, and letting k8s run our resources! That is what we call “Operators”!
With Operators, we should be able to write the logic for complete management of custom resources, and let k8s manage our resources! That's how we can move to low-touch ops!
So what all can we automate with operators…the answer is “everything that can be automated”. Right from installation, patching, updates, upgrades, backup, recovery, capturing telemetry, and acting based on AI (artificial intelligence to the nirvana stage of zero-touch ops.
There is a very well defined Operators maturity model, that clearly defines the 5 phases of maturity.
Operators SDK: Provides the tools to build, test, and package the Operators. Provides three SDK out of the box.
- Helm SDK: Provides a declarative way of building Operators, with this mainly install and configure kind of Operators can be built.
- Ansible SDK, Go SDK: Ansible and GO SDKs provide more advanced ways of building the Operators. where you can build Operators all the way to “Auto-Pilot” maturity.
Operator Lifecycle Manager (OLM): Manages the complete lifecycle of the Operator — installing and managing the Operator. OLM monitors the CRD that is deployed and when something changes..then it ensures that the changes are applied across the cluster.
Operator Metering: Reports the usage of the operator to help the metering.
Here is a quick walk-thru of building and deploying an Operator. Just for the completeness, I thought I will do a very quick walk-thru.
AIOps for Zero-Touch Ops
Artificial Intelligence and applying machine learning for ITOps has become a reality and has already become a very common practice to bring down the operational cost. So what capabilities are required for AIOps?
The picture above illustrates my understanding of AIOps capability architecture.
AIOps goes beyond standard event detection to advanced prediction with actionable insights. The term “actionable” is important — it’s the recommendation or execution of the best action to fix the current issues or issues that might occur based on prediction. This is what we really need for an “Auto-Pilot” Maturity, where it will replace or augment Site Reliability Engineers (SRE).
Now if you connect this generic picture of AIOps with what k8s Operators bring to the table, it is very clear that the operators have all that we need to be our AIOps engine.
All the various types of capabilities can be built as a CRs, and can be a bunch of operators that will bring all the pieces of AIOps to life, these operators co-locate inside the K8s cluster and run as PODs/Sidecars. They can also integrate with ServiceMesh for additional metrics and telemetry, and act proactively and operate the cluster.
The above picture provides a high-level view of the idea, and let's see how it maps to the three layers that we talked on the AIOps capability architecture
- Visibility: Visibility layer can be built on Grafana, providing single pane visibility of the cluster health
- Prediction: Prediction layer has all the modules (python modules to advanced spark clusters as specific operators), that build machine learning models from the data that is streaming from Prometheus, ServiceMesh/Istio.
- Resolution: Resolution can be simple k8s commands to Ansible playbooks or even invoking RPA digital works — depending on standard operating procedures, to recover the failures or take proactive measures
The best part is all of this AIOps is happening native to Kubernetes (except maybe RPAs)
There you go, Operators is the key to unlock the “Zero-Touch Ops” Journey.
In the meantime, I have been playing around with operators and will soon come back with a hands-on session…
Have fun and take care!
Opinions expressed by DZone contributors are their own.