DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Observability-Driven Development vs Test-Driven Development
  • Dynatrace Perform: Day Two
  • OpenTelemetry Moves Past the Three Pillars
  • Observability on Heroku: How to Monitor Apps on Managed Infrastructure

Trending

  • Revolutionizing Financial Monitoring: Building a Team Dashboard With OpenObserve
  • Building Enterprise-Ready Landing Zones: Beyond the Initial Setup
  • It’s Not About Control — It’s About Collaboration Between Architecture and Security
  • How Large Tech Companies Architect Resilient Systems for Millions of Users
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. Monitoring and Observability
  4. OpenTelemetry: Unifying Application and Infrastructure Observability

OpenTelemetry: Unifying Application and Infrastructure Observability

Explore how OpenTelemetry is revolutionizing observability by unifying application and infrastructure monitoring and empowering developers with open standards.

By 
Tom Smith user avatar
Tom Smith
DZone Core CORE ·
Jul. 25, 24 · Interview
Likes (2)
Comment
Save
Tweet
Share
4.2K Views

Join the DZone community and get the full member experience.

Join For Free

In this insightful Q&A, Goutham Veeramachaneni, a long-time Prometheus maintainer and Product Manager at Grafana Labs, shares his unique perspective on the transformative impact of OpenTelemetry (OTel) in the observability landscape. Veeramachaneni discusses how OTel is standardizing telemetry data and inspiring new open-source data collectors and workflows that bridge the gap between application and infrastructure monitoring. He offers valuable insights into the evolving ecosystem, the challenges ahead, and the exciting possibilities for developers in composing more effective telemetry data pipelines.

Q: As a Long-Time Prometheus Maintainer, What’s Your Take on the Overall Impact That OpenTelemetry Has Had on the Market?

A: It’s given developers and platform teams much greater ownership of their data. It’s given them flexibility and freedom that they didn’t have before. Previously, with no universal open standard for telemetry data, the proprietary vendor mousetraps were designed to make it super difficult to migrate to other solutions, which was insane. These vendors didn’t have a lot of incentive to innovate or compete, because they had instrumented such effective mousetraps to lock users in. They spoke their protocols and collected their metrics, and there was no standardization. OpenTelemetry already has forced the entire market to standardize on the OTLP protocol and its ecosystem of SDKs and APIs. That has taken the power away from vendors and created a standard that is dynamic and open and where everyone collaborates — which is driving a ton of innovation.

Q: What Is the Most Exciting New Progress That You’ve Seen With OTel in the Last Year?

A: I’m really excited to see how Prometheus and OTel are coming together and all the momentum that’s bringing application observability to the same level of standardization and consistency as infrastructure observability. Prometheus is such a staple in infrastructure observability – everyone uses it directly, or a flavor of Prometheus from a vendor. One of the reasons why Prometheus is so popular is because there is an exporter for just about every infrastructure component and just such a massive community supporting it. However, until OTel, no such standardization and velocity of innovation existed in application observability because you needed to spend a lot of effort to create auto instrumentation agents, and only the prominent vendors with teams of 20 people working on these instrumentation agents had that skill. So, application observability had all these proprietary protocols and methods for metrics collections, and there was no standardization. But now OTel has created a foothold by bringing in a standard where you can monitor your application and is similarly implementing that standard for all the popular languages.

Q: What’s the Implication of Application and Infrastructure Observability Coming Together, and What Needs To Happen for Us To Get There?

A: Well, we already have a standard, where, just like Prometheus and its exporters, now in the application observability world, you have all these SDKs and auto-instrumentation agents generating OpenTelemetry. Prometheus has made a lot of strides in the past year to start ingesting OTel data, so the infrastructure and application metrics now sit in the same system side-by-side. Prometheus 3.0, coming later this year, has OpenTelemetry support as one of the main features and focus areas.

However, the story doesn’t end there. You need to be able to correlate the metrics together easily. For example, you need to be able to relate a spike in errors in your RED metrics with a saturation of the CPU on the node the application is running on. This is tricky because the conventions for Prometheus and OpenTelemetry don’t line up yet. I believe this is what the community will focus on in the next year: making sure you can seamlessly correlate and navigate the data between the two worlds of App monitoring with OTel and Infra monitoring with Prometheus.

Finally, there is also the sticking point of “collection” of this data. While you can collect Prometheus metrics with the OpenTelmetry Collector, you’ll convert the Prometheus into OTLP and then back into Prometheus data (as the datastore is Prometheus). This has a high CPU overhead today and is something we want to optimize. This is also why I am excited about Alloy, Grafana’s new open-source collector. Alloy comes from a Prometheus-first heritage and embeds several infra-Prometheus exporters. It is also an OTel Collector distribution and supports collecting and processing OTel data efficiently. It shines because if you are collecting Prometheus data and your final destination is also Prometheus, it avoids the CPU cost of converting into OTLP in the pipeline.

That’s one of the beauties of open standards and something stable to build against — that you can either use the OTel collector directly, you can use a new collector like Alloy that is optimized for making Prometheus and OTLP more seamless to work with together, or you can try any other collector. OTel has created this buyer’s market where developers will have so much optionality on which observability tools they use while knowing that they own their own telemetry data underneath and that OTel itself is doing the heavy lifting of language support.

Q: Can You Describe the Historical Disconnect Between Application and Infrastructure Observability?

A: Today, if you look at the systems you’re trying to debug, you have the infrastructure – MySQL, Postgres, databases, node-level data like what hosts I’m running, and memory and CPU it’s running – and then you are running applications/containers on top. When you get a page, users see errors — and suddenly, you have a set of dashboards that show a list of applications and how each application performs. It’s easy to see where an error is occurring (say, in MySQL), but going from there to finding out what is wrong with MySQL is not easy. Because the application is talking OpenTelemetry metrics and MySQL is talking Prometheus metrics — there is no standardization between the two. It takes a lot of expertise to find the correct instance of a running service and do these types of correlations, and this is going to get much easier as we see deeper native integration between Prometheus and OpenTelemetry.

Q: What Are Some of the Problem Domains That You See OTel Tackling in the Future That Are Yet To Be Solved for Telemetry Data?

A: To get anything across the goal line in terms of standardization, you have to build consensus across many people and groups–that’s one of the things that’s so remarkable about what OTel has accomplished. Once specifications are settled, execution and innovation on top of that becomes easy. I see some realms where it’s inevitable that OpenTelemetry will have a similar impact on standardizing telemetry data, but where things are still so early that the consensus is still a way out. Front-end monitoring user monitoring is one example. In logging, we need to see more databases and logging systems adopting the OTel specification. Semantic conventions for LLMs are still largely proprietary. Observing messaging queues and applications built on top is still in its early days. There’s a ton of innovation in the CI/CD space, which needs better metrics to understand how long builds are taking — and that’s another exciting opportunity area for OTel. 

Infrastructure Observability Telemetry application

Opinions expressed by DZone contributors are their own.

Related

  • Observability-Driven Development vs Test-Driven Development
  • Dynatrace Perform: Day Two
  • OpenTelemetry Moves Past the Three Pillars
  • Observability on Heroku: How to Monitor Apps on Managed Infrastructure

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!