DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Manual Investigation: The Hidden Bottleneck in Incident Response
  • Seeing the Whole System: Why OpenTelemetry Is Ending the Era of Fragmented Visibility
  • Observability in AI Pipelines: Why “The System Is Up” Means Nothing
  • Beyond the Heartbeat: Monitoring Agentic Systems

Trending

  • You Don't Get to Retrofit Trust: Why API Security Must Be Designed In, Not Bolted On
  • AWS Managed Database Observability: Monitoring DynamoDB, ElastiCache, and Redshift Beyond CloudWatch
  • Why Your DLP Policies Fall Short the Moment AI Agents Enter the Picture
  • AI Paradigm Shift: Analytics Without SQL
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. Monitoring and Observability
  4. Implementing Observability in Distributed Systems Using OpenTelemetry

Implementing Observability in Distributed Systems Using OpenTelemetry

Instrument a Python Flask service with OpenTelemetry auto trace requests, export metrics to Prometheus, and inject trace IDs into logs for observability in one setup.

By 
Mugunth Chandran user avatar
Mugunth Chandran
·
May. 29, 26 · Analysis
Likes (0)
Comment
Save
Tweet
Share
149 Views

Join the DZone community and get the full member experience.

Join For Free

Modern distributed systems demand observability, the ability to understand internal states from external outputs. Observability is achieved by collecting traces, logs, and metrics to improve performance, reliability, and availability. No single signal is sufficient; it's the combination and correlation of these data that form a narrative for root cause analysis. 

In monolithic applications, debugging was easier since one service handled a request. In contrast, microservices distribute a request across many services, making it hard to follow a transaction’s path. OTel’s distributed tracing shines here; it propagates context with each request, so you can trace a transaction across service boundaries. 

This means when Service A calls Service B, they share a common trace ID, allowing you to view a single trace spanning multiple services. Similarly, OpenTelemetry can attach unique identifiers to logs, making it easier to correlate log events across services. Overall, OTel provides a unified API for instrumenting code and an ecosystem of instrumentation libraries for frameworks that can automatically capture common operations. It focuses on data generation and collection, while the actual storage and querying of telemetry is handled by backend tools.

Setting Up OpenTelemetry in a Python Microservice

Installation

To get started, install the OpenTelemetry libraries for Python. At minimum, you'll need the API and SDK, plus exporters/instrumentation for your use case. For example:

PowerShell
 
pip install opentelemetry-api opentelemetry-sdk  \
            opentelemetry-exporter-prometheus  \
            opentelemetry-instrumentation-flask


This installs the core OTel API/SDK and the Prometheus metrics exporter and Flask instrumentation. You might also install the OTel OTLP exporter, which is a generic exporter that can send data to an OpenTelemetry Collector or other backend via the OTLP protocol. Additionally, it's recommended to set a service name for your application so that telemetry from this service is identifiable. This can be done via code or an environment variable. In code, you'll see below how we attach a service name as a resource attribute so that traces and metrics are tagged with service.name.

Distributed Tracing With OpenTelemetry

Tracing involves capturing spans that represent units of work in the system. In a microservice, a span could represent an incoming HTTP request, a database query, or an external API call. Spans form a trace when linked together via context propagation. Using OpenTelemetry, we can instrument our Python service to create spans for critical operations and automatically propagate the trace context to downstream services.

First, let's initialize OpenTelemetry tracing in our Python microservice. We create a tracer provider, configure an exporter, and obtain a tracer instance:

Python
 
import time
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, BatchSpanProcessor
from opentelemetry.sdk.resources import Resource, SERVICE_NAME

# Set up tracer provider with service name for identification
trace.set_tracer_provider(TracerProvider(resource=Resource.create({SERVICE_NAME: "order-service"})))
tracer = trace.get_tracer(__name__)
# Configure a span processor with a Console exporter (prints trace data to stdout)
trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))

# Example: instrument a code block with a span
with tracer.start_as_current_span("process_order"):
    # Simulate processing (e.g., calling another service or performing work)
    time.sleep(0.1)
    # If this code calls another service, OpenTelemetry context propagates via HTTP headers automatically


In this snippet, we configured a TracerProvider with a resource attribute service.name="order-service" so that all spans from this service are labeled. We added a BatchSpanProcessor with a ConsoleSpanExporter this will batch and print our spans to the console in JSON for demonstration. In a real system, you might use a Jaeger exporter here to send spans to a Jaeger agent. The tracer = trace.get_tracer(__name__) gives us a tracer we can use to start spans. We then start a span named "process_order" using a context manager (start_as_current_span), which automatically ends the span when the block exits. Inside that span, you would put the operation you want to measure.

Metrics Collection and Export (Prometheus Integration)

While tracing shows the path of individual requests, metrics provide aggregated insights into system behavior. OpenTelemetry’s metrics API allows you to define instruments like counters and histograms to record these values.

First, ensure the Prometheus client/exporter is set up. We’ll use OTel’s Prometheus exporter, which works by exposing a /metrics HTTP endpoint that Prometheus will scrape. In code, this is done by creating a PrometheusMetricReader and starting an HTTP server for metrics. Here’s how you can integrate metrics in a Flask microservice:

Python
 
from flask import Flask, request, g
from prometheus_client import start_http_server
from opentelemetry import metrics
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.resources import Resource, SERVICE_NAME
from opentelemetry.instrumentation.flask import FlaskInstrumentor
import time

# Initialize metrics provider with Prometheus exporter (reader)
resource = Resource(attributes={SERVICE_NAME: "order-service"})
reader = PrometheusMetricReader()  # exposes metrics in Prometheus format
provider = MeterProvider(resource=resource, metric_readers=[reader])
metrics.set_meter_provider(provider)
meter = metrics.get_meter(__name__)

# Define metric instruments
request_counter = meter.create_counter(
    name="app_requests_total",
    description="Total number of requests processed",
    unit="1"
)
request_latency = meter.create_histogram(
    name="app_request_latency_ms",
    description="Request latency in milliseconds",
    unit="ms"
)

# Start Prometheus client on an endpoint (e.g., port 8000) for scraping
start_http_server(port=8000, addr="0.0.0.0")

# Flask app and instrumentation
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)  # auto-instrument Flask for tracing

@app.before_request
def before_request():
    g.start_time = time.time()
    # Increment counter for each incoming request, with the route path as a label
    request_counter.add(1, {"endpoint": request.path})

@app.after_request
def after_request(response):
    # Record the request duration in milliseconds
    duration_ms = (time.time() - g.start_time) * 1000
    request_latency.record(duration_ms, {"endpoint": request.path})
    return response

# Example route
@app.route("/hello")
def hello():
    return "Hello, World!", 200


In the setup above, we configured a MeterProvider with a PrometheusMetricReader. This essentially registers an HTTP endpoint that exposes our metrics in Prometheus format. We explicitly call start_http_server(port=8000) to start the metrics server on port 8000, Prometheus will scrape this. We created two metric instruments: a counter to count the number of requests, and a histogram to track the distribution of request durations. 

In the Flask hooks, we use these instruments: at the beginning of each request, we note the start time and increment the counter. After the request is handled, we compute the elapsed time and record it in the histogram again, labeled by the endpoint path. These labels let us break down metrics per route.

Log Correlation With OpenTelemetry

Logs are the third pillar of observability. They provide detailed event information and error messages. OpenTelemetry can augment logging by injecting trace context into logs, so that you know which trace/span a log entry is associated with.

In Python, the package opentelemetry-instrumentation-logging can automatically enrich Python logging records with trace context. After installing it, you can enable it with:

Python
 
from opentelemetry.instrumentation.logging import LoggingInstrumentor
LoggingInstrumentor().instrument(set_logging_format=True)


This will ensure that whenever you call the standard logging functions, if a trace is currently active, the log record will contain the trace and span IDs. For instance, you might see logs like:

Plain Text
 
INFO [trace_id=0xf4a3b...] Order 123 processed successfully


indicating that the log was emitted during a specific trace. To fully centralize logs, you would forward them to a log backend. One approach is using the OpenTelemetry Collector to collect and export logs.

Conclusion

Implementing observability in a microservice architecture is no small feat, but OpenTelemetry greatly simplifies the process by providing a one-stop solution for instrumentation. We have shown how to set up distributed tracing to follow requests across services, how to collect metrics and export them to Prometheus for monitoring, and how to correlate logs with trace context. With these in place, you gain deep visibility into your system. You can monitor performance and identify latency bottlenecks, get alerted on anomalies via metrics, trace requests end-to-end to see where failures occur, and dive into logs for detailed errors. This comprehensive observability is crucial for engineers to effectively maintain and optimize distributed systems.

In summary, OpenTelemetry enables a consistent, portable way to implement observability across distributed systems. Embracing it in your microservices will lead to faster debugging, better performance insights, and more resilient applications. With traces, metrics, and logs at your fingertips, you are no longer flying blind in your distributed architecture; instead, you have the data to understand and improve your system continually.

Observability systems

Opinions expressed by DZone contributors are their own.

Related

  • Manual Investigation: The Hidden Bottleneck in Incident Response
  • Seeing the Whole System: Why OpenTelemetry Is Ending the Era of Fragmented Visibility
  • Observability in AI Pipelines: Why “The System Is Up” Means Nothing
  • Beyond the Heartbeat: Monitoring Agentic Systems

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook