Beyond Logs, Metrics, and Tracing: Five Ways to Up Your Observability Game
Beyond Logs, Metrics, and Tracing: Five Ways to Up Your Observability Game
Let's take a different look at the pillars of observability and how to improve it into the intelligent future.
Join the DZone community and get the full member experience.Join For Free
Early on in the cloud adoption cycle, businesses and technology professionals focused more on new applications designed to deliver transformative results. Fast-forward to today, when cloud service providers are providing services to make mission-critical application delivery more affordable if enterprise teams learn how to take advantage of those services. For teams who make the transition, cloud may enable frictionless consumption of services and newfound flexibility. Unfortunately, that’s a big “if.”
Cloud has become somewhat of a double-edged sword for today’s tech professionals. For all its benefits, the cloud also brings increased opacity, diminished observability, and can present novel issues, at least in the short term. While IT expects the same level of durability, service delivery, and observability as on-prem, the cloud is an entirely different environment and requires a different approach, with new tools and new skills to master.
Too often, there’s a post-migration scramble to regain previous levels of observability, but it’s no easy feat: those managing cloud and distributed environments will understand that many of the familiar on-premise tools are no longer sufficient. Moreover, many of the approaches to monitoring must also change to meet new needs introduced by the lack of visibility beyond the firewall, for example, adding transaction tracing in distributed applications. And of course, leadership expects that telemetry alongside the other metrics they already find reassuring.
At the same time, the shift to cloud has introduced another observability challenge: companies with workloads and applications in the cloud must now extend monitoring to include any connections made to cloud resources, to avoid violation of policy, security, or compliance.
The continued expansion of cloud services shows no sign of slowing: 94% of technology professionals surveyed in SolarWinds’ IT Trends Report 2018: The Intersection of Hype & Performance, indicate that cloud and hybrid IT are among the top five most important technologies in their IT organization’s technology strategy today. In fact, 51% listed cloud/hybrid IT as their most important technology challenge. We may look back on 2018 as the tipping point, where IT stepped up to the plate to manage a host of formerly DevOps-owned applications. Admittedly, it’s a great problem to have—while challenging, it proves IT can handle almost anything.
As we continue to progress towards increasingly critical cloud environments and distributed workloads, tech professionals should go beyond simple performance metrics to achieve the right level of true observability in the cloud, leveraging combined metrics, logs, application traces, and a few other best practices for controllability—all built right into their organization’s cloud strategy.
Not Your Mother’s Three Pillars of Observability
Working toward new methods of observability requires specific strategies and tactics. To get started, tech professionals can follow the following five tips to help gain observability in the cloud:
- Be flexible to adapt to the unknown. It goes without saying that mature monitoring tools and strategies aren’t the same in the cloud as they were on-premise. Applications tend to behave differently if you lift and shift them to containers, or if you have a VM that’s running on a cloud provider. Why would observability be any different? Flexibility to adapt application performance monitoring practices and tools is key—not only in which tools you choose to use, but also the way in which you’re monitoring. For example, even though your monitoring platforms can probably support new abilities (e.g. adding custom metrics), you may not have leveraged that type of functionality before. The element of flexibility is key, especially being able to advantage to incorporate data from unknown sources or collect it in what appears to be novel ways. Flexibility to adapt to the unknown is an important part of regaining lost observability
- Aggregate data. In addition to changing the monitoring and managing landscape, cloud introduces an entirely new set of metrics into the fray—metrics and numbers that can be distilled into business insights and used to meet business goals, rather than just basic usage mechanics from an ROI/cost operations data perspective. Aggregating and collating performance metrics into a dashboard with business metrics that hinge on the performance of the IT systems underneath, along with feed and speeds, as well as events, deployment, and new metrics like error rate, is key to demonstrating cloud ROI to management, and helping ensure uptime and a positive user experience for end users.
- Be brave. As a tech pro in the era of cloud, you probably need to take the time to explain to management (tactfully of course) that some of the metrics they relied on previously don't apply in a cloud environment. They’ll need to understand that losing some visibility into platform details shouldn’t automatically raise concerns that something isn’t working, or that certain business-critical applications aren’t being sufficiently monitored. More often than not, technology pros who are forced to migrate back on-premise have suffered an observability problem that didn’t allow them to fix the application, made users uneasy, or both. There’s a lot of human engineering that goes along with this implementation (in addition to actual services) and the demonstrated path through this complexity is to remember that you’re the expert, that you learn new technologies for a living, and to stand up. Stand up for yourself, your chops, and your environment.
- Learn to trust (and verify) monitoring services. We often think that our monitoring systems need to collect and store information on-premise because that's the “safest” place to put it. At the same time, relying on a service, or a hosted service, especially for most mission-critical information that we use to debug a system, can be nerve-wracking—what if there’s a VPN outage and these alerts cut out? Whether it's log activation, or custom metric storage, or distributed tracing, however, there must be an endpoint for all of these moving pieces. Verifying new technology and learning to trust service-based monitoring means developing skills in new technologies to ensure that this system is going to work together with redundancy that you didn’t need before when it was on-prem, and to gain insight into “observability of the observability” (i.e., who’s monitoring the monitoring system), to allow you to start thinking about breadth of monitoring, metrics, reach, and completeness. Make sure that you're increasing visibility, but that you can also prove you have enough—because otherwise, you won’t know when to stop.
- Extend monitoring. Extend monitoring to include comprehensive events and logs in the services to determine whether or not connections are being made to cloud resources in violation of policy. In the cloud, the last mile of event observability is often lost because the data is collected in ephemeral containers and processes, without a reliable mechanism for persistence. This can be especially concerning with regulatory compliance. New practices for event monitoring and from custom metrics are particularly helpful for novel root-cause troubleshooting and anomaly analysis.
Preparing for an Intelligent Future
As tech environments rapidly evolve, tech pros know they must continue developing and honing the skills necessary to maintain environments today and be ready to adapt to new technology pushed by leadership. As new tools based on machine learning and artificial intelligence assume more and more direct control of our systems, we’ll increasingly write code. And admins won’t so much code as automate—that will be the systems’ problem. We’ll spend most of our time teaching and coaching systems with digital policies that execute automatically. Instead, code will be silk we spin to bind auto-configuring fabric to the needs—the intent—of the business. We’ll learn to add monitoring to everything we do in the cloud, and sleep better when we illuminate the dark corners of missing monitoring. We’re headed toward an intelligent future, and with a little hustle, diligent tech pros will ensure they’re ready.
Opinions expressed by DZone contributors are their own.