How Operations Use Observability Data to Increase Customer Value
Discuss the specific roles of context in monitoring solutions and observability data, including how it supports customer value for digital products and services.
Join the DZone community and get the full member experience.Join For Free
Operations teams now have access to powerful monitoring solutions to track operational data. But, many of those same teams are missing a crucial piece of the puzzle: context. Without additional information to put context around monitoring data, teams can’t optimize resource allocation and can even miss out on vital fixes affecting the customer experience.
Within operations, two major examples of context rise to the top: metadata and clustering. These provide the lens through which teams view data and metrics, make smarter decisions about using resources, and take the lead on improving customer experience and adding customer value. Let’s look at each example and how teams can benefit from applying context to their work.
Metadata Accompanying Traditional Time-Series Data
One example is the context provided along with the data. That takes the form of metadata. While teams have relied on traditional time-series data in their operations, it’s insufficient by itself — it needs context from metadata to truly be useful to you.
For example, online gaming providers use the number of concurrent users as a KPI. But that KPI without context isn’t very useful because the expected number of users changes depending on the time of day and week. After all, the provider should expect the number of gamers on a Tuesday morning to look different from the number on a Friday night.
Add a periodicity capability into the mix, and teams can monitor a KPI compared to the user level appropriate for the time of day and week. If it’s not similar or higher to that comparison, then there’s cause for concern. Context through metadata enables you to prioritize potential issues and better assign workloads to meet those issues.
Clustering/Correlation/Enrichment for Incident Management
The other example is the external context applied to the data, which we use in clustering. Instead of looking at measurements individually, like one KPI, clustering shows things happening at the same time, which can spot a relationship between factors — especially useful for incident management.
Continuing with the online gaming example, a team notices their number of concurrent users dropped below what they’d expected for that time of day. Why? It could be that players are having trouble logging into the game, a fairly common issue among gaming providers but one that would be difficult to see with the number of users KPI alone. Clustering it with underlying issues, like errors from an authentication system, would help teams identify a problem like failed logins faster because it’s now in the context of a fault. More accurate correlations through context direct teams’ attention to where it’s needed most.
This type of enrichment can be as sophisticated as integration with other applications that provide additional context, such as databases or contact management systems. It could also be as simple as adding tags that can enable clustering by systems using natural language processing (NLP) techniques. Regardless, the application of external context extracts new insights from data that you can use when deploying team resources.
Delivering More Customer Value
Context assists operations teams in solving issues faster and better understanding the meaning behind their data and metrics. This also unlocks a team’s ability to deliver more customer value, a benefit that catches the attention of internal stakeholders.
In the past, operations have been about getting things done and generating reliability. Now, there’s a more pressing need to fine-tune operation cycles to keep competitive the products and services born in the digital world. And every minute saved is thousands of dollars saved.
Context helps identify and resolve the recurring issues that contribute to “toil” — the repetitive, onerous manual work that consumes those valuable minutes and prevents teams from moving onto more advanced functions. Automating manual toil frees time and budget that teams can dedicate toward innovative ways to fine-tune operation cycles. Context is the engine for such automation.
Say the online gaming provider’s authentication system did indeed fail. A team caught unawares cannot resolve the issue quickly, which means users have time to notice. Once their logins start failing, users take to their online platforms and complain. It’s a very real — and very visible — customer impact.
Once a team finds out the issue, how do they begin to fix it? Context enables them to prioritize their work to solve the issue as fast as possible and with minimum resource use. Having already automated away toil, they can dedicate time and error budgets to the most pressing needs and deliver on customer expectations. By optimizing space in error budgets, you can pursue more opportunities to not only fix but also improve the customer experience. Your team can contribute to overall customer value beyond merely resolving issues.
What to Consider When Applying Context
As you can see, applying context to your data and metrics can offer your team a lot of benefits. You’ll want a monitoring solution that prioritizes and incorporates context into its data streams. That includes identifying and integrating your third-party enrichment sources like AWS Cloudwatch or PagerDuty — an important part of any plan to improve your system’s observability with context.
As you consider how to apply context to your monitoring data, though, keep in mind the following:
- Metadata is important. When buying a monitoring solution, if you forget the metadata, you’ve yanked away your platform of context, and you can’t add it back in. Keep metadata at the forefront during those buying decisions.
- Resist the anti-pattern of throwing away good data. Teams tend to throw good data away to save money, but it’s useless once it’s gone. Prioritize good data and, when you join your data with an intelligent observability solution, the benefits will soon outweigh any potential costs.
- Connect all the dots. The independent systems you purchase that contain contextual information have to be connected. If your context lives somewhere separate from your data then, by definition, you don’t have context.
With context supporting them, operations teams can more accurately find and solve problems before they become serious, maintaining the expected customer experience and saving time and money. Key insights from context lend teams opportunities to pursue further improvements on customer experience as well, increasing their value to customers and the business. Complete your monitoring puzzle with the context your operations teams need.
Opinions expressed by DZone contributors are their own.