Understanding How CALMS Extends To Observability
With CALMS as a measure, you can understand how observability is improving your DevOps practices. Learn more about CALMS and observability in this article.
Join the DZone community and get the full member experience.Join For Free
As our observability and DevOps practices continue to join, similar alignment happens with our frameworks and goals. In fact, when one considers that observability is about increased and deeper data about our environments, it is evident that the frameworks aren’t changing so much as adapting to faster insights.
Let’s take a look at CALMS. CALMS (Culture, Automation, Lean, Measurement, Sharing) was created by Jez Humble and is meant as a method of assessing how an organization is adapting to DevOps practices. However, as we add observability, CALMS can extend to our observability practice as well.
So what is CALMS really, and how does observability fit into it? Let’s take a look.
Culture can be defined as the knowledge and characteristics of a particular group. In short, it is the patterns of interaction chosen and followed by the group. For DevOps, it is often a culture of accepting change, from grassroots changes to deliberate and planned changes.
For observability, especially in our cloud-enabled and cloud-native environments, we are constantly changing. Our services are updated and deployed frequently. Our communication links are shifting and even our underlying infrastructures may be elastic and ephemeral.
Managing and dealing with this change is a shared responsibility. It covers our environments from our operations all the way to our developers. The nice thing is that observability, coupled with the correct tools, gives us the cross-over data to deal with the changes across the entire spectrum of involved teams. By having the data (and thus the common voice), we have the basis of shared knowledge to drive our shared characteristics in our culture.
A quick digression. Incidents (not in CALMS) have their own culture, driven by the maturity of the response process. Incidents are heavily driven by a company’s incident management process rather than a culture, with specific actions defined for severity, clearly laid out points for collaboration, and a (usually) separate defined communication path. The maturity of the process is the driving factor, not the stability of the community.
Automation is a crucial component of DevOps. If it can be automated, it should be. We should let our systems run our rote tasks, which can often cross the 6 C’s of DevOps. But automation does not mean “remove the humans”. Automation still requires feedback to work and for humans to understand what and why things are happening.
Observability brings a whole new potential for automation. After all, observability is paired with controllability. Observability provides the data that allows us to analyze and respond automatically to changes in the environment, whether it’s as simple as an elastic expansion of our compute instances or as complex as an AI routine identifying and responding to an attempted breach.
An easy case to identify is an auto-remediation of an automated deployment. If our deployment of a new service is problematic, then we should be able to revert to a previous good configuration. This depends on potentially unconnected data, which we can correlate. We need to know the event, in this case, a service deployment took place. Next, we need a detector/trigger that can identify our targeted bad condition (errors suddenly climb, latency sees a massive increase, or requests decline rapidly) within an acceptable period of time. From that, we need the ability to trigger rollback based on both of those elements. And we need to see the event that says the rollback occurred. It should also be followed by an alert on the problematic service to the right people, after all, it still needs to be fixed. Yes, your mileage may vary, as it is possible that the events are uncorrelated, but if I deployed a new service and everything went haywire in the next 30 seconds, I’d probably start with the service as a cause, rather than assuming it was the phase of the moon.
We all focus on lean these days. We need to be lean and agile since the cloud means both more agility as well as easier adoption. When thinking lean, we are focused on increasing customer value, eliminating waste, being a facilitator, making continual improvement, focused on long-term goals, and involving everyone.
So how can observability help? Well, the data we receive helps us understand our usage as well as our costing for our cloud resources. If we expose that to the teams, we allow the teams to help eliminate waste and improve customer value. Our observability data lets us share with the related teams that drive awareness, both in our systems and in our business.
There is a side note, and the impact can vary based on your culture. Sharing the data and its analysis can foster awareness but can also foster competition. The choice of whether that competition is healthy or not is dependent on how you use it.
“If you can’t measure it, you can’t improve it”. This quote from Peter Drucker is perhaps one of the most important ideas in management.
But we need to go a step farther and set goals. We need to measure not only status but progress towards our goals. And honestly, we need to think up our meaningful goals, especially on critical items, like user happiness and time-to-value.
The nice thing is that observability lets us not only think of our goals but actively measure them. Often couched in terms like Service Level Indicators, Service Level Objective, or error budgets, we now have the data that lets us aggregate, analyze and visualize our goals, from business workflows to our user’s success and latency on requests.
But to be successful, goals should be widely shared. All involved teams and individuals should have speedy and accurate access to understanding the status and progression towards goals.
However, it is also important to not fall into the data trap, using selected data to make decisions or highlight progress. That can fall into a category of survivorship bias (using the data that survives to be analyzed) or selection bias (using data that we tailor to a perceived need. For our goals to be both accurate and precise, we need to target the complete data relative to our goal.
Openness and communication are at the heart of this category. We need to understand each other’s roles, and even the way teams approach and deal with problems.
Again, observability lets us share information easily. After all, our observability world is becoming more integrated, more open in its own right, with the acceptance and adoption of OpenTelemetry. By having common data, we can establish related goals. We can handoff from team to team with less friction and less misunderstanding.
In turn, having that common data set means we can better learn and adapt our own practices to fit into other teams, just as they, in turn, are doing. More learning equals better attention to common goals and improved business results.
Key Points for Using Observability
So observability of the data and the insights it can provide, give us even more ability to use CALMS. But there are some points to consider to make this successful.
Establish guidelines. This can relate to your goal setting, but set something that delineates the adoption and use of observability. Often, just like with adopting a DevOps practice, the adoption and inclusion of observability is an incremental and continual process.
Show the state of your automation. Let everyone know what is and isn’t automated and use the data to constantly iterate and improve your results. When it comes to automation, and in particular impactful responses like auto-remediation, everyone should be able to see and understand the what and the why.
Keep everyone in the loop. This includes your users. While they may not need to know that your instances failed to respond to increased demand, they should know that things, particularly incidents, are being responded to and resolved. Think about what data is useful for them and make it available.
Map your data to your operational goals, both technical and business. This doesn’t mean filtering selections to prove your point. It means using the correct data to show the point. Also, watch out for survivorship bias. Don’t throw data away just because it is data, instead of depending on your tools to help aggregate and visualize it while allowing drill in on points of interest, even down to the original raw data.
With the use of observability, you can be even CALMS-er and carry on making things better.
Opinions expressed by DZone contributors are their own.