DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • What Is Pydantic?
  • The Power of ShardingSphere With Spring Boot
  • Keep Your Application Secrets Secret
  • Implementing Infinite Scroll in jOOQ

Trending

  • Doris: Unifying SQL Dialects for a Seamless Data Query Ecosystem
  • Failure Handling Mechanisms in Microservices and Their Importance
  • How to Configure and Customize the Go SDK for Azure Cosmos DB
  • GDPR Compliance With .NET: Securing Data the Right Way
  1. DZone
  2. Coding
  3. Languages
  4. End-to-End Tracing With OpenTelemetry

End-to-End Tracing With OpenTelemetry

Tracing is one of the pillars of observability. This article focuses solely on traces and describes how you can start your journey into observability.

By 
Nicolas Fränkel user avatar
Nicolas Fränkel
DZone Core CORE ·
Sep. 02, 22 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
9.2K Views

Join the DZone community and get the full member experience.

Join For Free

Whether you implement microservices or not (and you probably shouldn't), your system is most probably composed of multiple components. The most straightforward system is probably made of a reverse proxy, an app, and a database. In this case, monitoring is not only a good idea, it's a requirement. The higher the number of components through which a request may flow, the strongest the requirement.

However, monitoring is only the beginning of the journey. When requests start to fail en masse, you need an aggregated view across all components. It's called tracing, and it's one of the pillars of observability. The other two are metrics and logs.

In this post, I'll focus solely on traces and describe how you can start your journey into observability.

The W3C Trace Context Specification

A tracing solution should provide a standard format to work across heterogeneous technology stacks. Such a format needs to adhere to a specification, either a formal one or a de facto one.

One needs to understand that a specification rarely appears from nowhere. In general, the market already has a couple of distinct implementations. Most of the time, a new specification leads to an additional implementation, as the famous XKCD comic describes:

How Standards Proliferate

Sometimes, however, a miracle happens: the market adheres to the new specification. Here, Trace Context is a W3C specification, and it seems to have done the trick:

"This specification defines standard HTTP headers and a value format to propagate context information that enables distributed tracing scenarios. The specification standardizes how context information is sent and modified between services. Context information uniquely identifies individual requests in a distributed system and also defines a means to add and propagate provider-specific context information."

Two critical concepts emerge from the document:

  • A trace follows the path of a request that spans multiple components.
  • A span is bound to a single component and linked to another span by a child-parent relationship.

Single trace and spans

At the time of this writing, the specification is a W3C recommendation, which is the final stage.

Trace Context already has many implementations. One of them is OpenTelemetry.

OpenTelemetry as the Golden Standard

The closer you are to the operational part of IT, the highest the chances that you've heard about OpenTelemetry:

"OpenTelemetry is a collection of tools, APIs, and SDKs. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.

OpenTelemetry is generally available across several languages and is suitable for use."

OpenTelemetry is a project managed by the CNCF. Before OpenTelemetry stood two projects:

  • OpenTracing, which focused on traces, as its name implies
  • OpenCensus, whose goal was to manage metrics and traces

Both projects merged and added logs on top. OpenTelemetry now offers a set of "layers" focusing on observability:

  • Instrumentation APIs in a variety of languages
  • Canonical implementations, again in different languages
  • Infrastructure components, such as collectors
  • Interoperability formats, such as the W3C's Trace Context

Note that while OpenTelemetry is a Trace Context implementation, it does more. Trace Context limits itself to HTTP while OpenTelemetry allows spans to cross non-web components, such as Kafka. It's outside the scope of this blog post.

The Use-Case

My favorite use case is an e-commerce shop, so let's not change it. In this case, the shop is designed around microservices, each accessible via a REST API and protected behind an API Gateway. To simplify the architecture for the blog post, I'll use only two microservices: catalog manages products, and pricing handles the price of products.

When a user arrives on the app, the home page fetches all products, gets their respective price, and displays them.

E-commerce shop use case diagram

To make things more interesting, catalog is a Spring Boot application coded in Kotlin, while pricing is a Python Flask application.

Tracing should allow us to follow the path of a request across the gateway, both microservices and, if possible, the databases.

Traces at the Gateway

The entry point is the most exciting part of tracing, as it should generate the trace ID. In this case, the entry point is the gateway. I'll use Apache APISIX to implement the demo:

"Apache APISIX provides rich traffic management features like Load Balancing, Dynamic Upstream, Canary Release, Circuit Breaking, Authentication, Observability, etc."

Apache APISIX is based on a plugin architecture and offers an OpenTelemetry plugin:

"The opentelemetry Plugin can be used to report tracing data according to the OpenTelemetry specification.

The Plugin only supports binary-encoded OLTP over HTTP."

Let's configure the opentelemetry plugin:

YAML
 
apisix:
  enable_admin: false              #1
  config_center: yaml              #1
plugins:
  - opentelemetry                  #2
plugin_attr:
  opentelemetry:
    resource:
      service.name: APISIX         #3
    collector:
      address: jaeger:4318         #4


#1: Run Apache APISIX in standalone mode to make the demo easier to follow. It's a good practice in production anyway.

#2: Configure opentelemetry as a global plugin.

#3: Set the name of the service. It's the name that will appear in the trace display component.

#4: Send the traces to the jaeger service. The following section will describe it.

We want to trace every route, so instead of adding the plugin to each route, we should set up the plugin as a global one:

YAML
 
global_rules:
  - id: 1
    plugins:
      opentelemetry:
        sampler:
          name: always_on          #1


#1: Tracing has an impact on performance. The more we trace, the more we impact. Hence, we should carefully balance the performance impact vs. the benefits of observability. For the demo, however, we want to trace every request.

Collecting, Storing, and Displaying Traces

While Trace Context is a W3C specification and OpenTelemetry is a de facto standard, many solutions exist to collect, store and display traces on the market. Each solution may provide all three capabilities or only part of them. For example, the Elastic stack handles storage and display, but you must rely on something else for collection. On the other hand, Jaeger and Zipkin do provide a complete suite to fulfill all three capabilities.

Jaeger and Zipkin predate OpenTelemetry, so each has its trace transport format. They do provide integration with the OpenTelemetry format, though.

In the scope of this blog post, the exact solution is not relevant, as we only need the capabilities. I chose Jaeger because it provides an all-in-one Docker image: every capability has its component, but they are all embedded in the same image, which makes configuration much more effortless.

The image's relevant ports are the following:

Port Protocol Component Function
16686 HTTP query serve frontend
4317 HTTP collector accept OpenTelemetry Protocol (OTLP) over gRPC, if enabled
4318 HTTP collector accept OpenTelemetry Protocol (OTLP) over HTTP, if enabled

The Docker Compose bit looks like this:

YAML
 
services:
  jaeger:
    image: jaegertracing/all-in-one:1.37           #1
    environment:
      - COLLECTOR_OTLP_ENABLED=true                #2
    ports:
      - "16686:16686"                              #3


#1: Use the all-in-one image.

#2: Very important: enable the collector in OpenTelemetry format.

#3: Expose the UI port.

Now that we have set up the infrastructure, we can focus on enabling traces in our applications.

Traces in Flask Apps

The pricing service is a simple Flask application. It offers a single endpoint to fetch the price of a single product from the database.

Python
 
@app.route('/price/<product_str>')                           #1-2
def price(product_str: str) -> Dict[str, object]:
    product_id = int(product_str)
    price: Price = Price.query.get(product_id)               #3
    if price is None:
        return jsonify({'error': 'Product not found'}), 404
    else:
        low: float = price.value - price.jitter              #4
        high: float = price.value + price.jitter             #4
        return {
            'product_id': product_id,
            'price': round(uniform(low, high), 2)            #4
        }


#1: Endpoint

#2: The route requires the product's id.

#3: Fetch data from the database using SQLAlchemy.

#4: Real pricing engines never return the same price over time. Let's randomize the price a bit for fun.

Warning: Fetching a single price per call is highly inefficient. It requires as many calls as products, but it makes for a more exciting trace. In real life, the route should be able to accept multiple product ids and fetch all associated prices in one request-response.

Now is the time to instrument the application. Two options are available: automatic instrumentation and manual instrumentation. Automatic is low effort and a quick win; manual requires focused development time. I'd advise starting with automatic and only adding manual if required.

We need to add a couple of Python packages:

opentelemetry-distro[otlp]==0.33b0
opentelemetry-instrumentation
opentelemetry-instrumentation-flask


We need to configure a couple of parameters:

YAML
 
pricing:
  build: ./pricing
  environment:
    OTEL_EXPORTER_OTLP_ENDPOINT: http://jaeger:4317     #1
    OTEL_RESOURCE_ATTRIBUTES: service.name=pricing      #2
    OTEL_METRICS_EXPORTER: none                         #3
    OTEL_LOGS_EXPORTER: none                            #3


#1: Send the traces to Jaeger.

#2: Set the name of the service. It's the name that will appear in the trace display component.

#3: We are interested neither in logs nor in metrics.

Now, instead of using the standard flask run command, we wrap it:

Shell
 
opentelemetry-instrument flask run


Just with this, we already collect spans from method calls and Flask routes.

We can manually add additional spans if needed, e.g.:

Python
 
from opentelemetry import trace

@app.route('/price/<product_str>')
def price(product_str: str) -> Dict[str, object]:
    product_id = int(product_str)
    with tracer.start_as_current_span("SELECT * FROM PRICE WHERE ID=:id", attributes={":id": product_id}) as span: #1
        price: Price = Price.query.get(product_id)
    # ...


#1: Add an additional span with the configured label and attribute.

Traces in Spring Boot Apps

The catalog service is a Reactive Spring Boot application developed in Kotlin. It offers two endpoints:

  • One to fetch a single product
  • The other is to fetch all products.

Both first look in the product database, then query the above pricing service for the price.

As for Python, we can leverage automatic and manual instrumentation. Let's start with the low-hanging fruit, automatic instrumentation. On the JVM, we achieve it through an agent:

Shell
 
java -javaagent:opentelemetry-javaagent.jar -jar catalog.jar


As in Python, it creates spans for every method call and HTTP entry point. It also instruments JDBC calls, but we have a Reactive stack and thus use R2DBC. For the record, a GitHub issue is open for adding support.

We need to configure the default behavior:

YAML
 
catalog:
  build: ./catalog
  environment:
    APP_PRICING_ENDPOINT: http://pricing:5000/price
    OTEL_EXPORTER_OTLP_ENDPOINT: http://jaeger:4317     #1
    OTEL_RESOURCE_ATTRIBUTES: service.name=orders       #2
    OTEL_METRICS_EXPORTER: none                         #3
    OTEL_LOGS_EXPORTER: none                            #3


#1: Send the traces to Jaeger.

#2: Set the name of the service. It's the name that will appear in the trace display component.

#3: We are interested neither in logs nor in metrics.

As for Python, we can up the game by adding manual instrumentation. Two options are available: programmatic and annotation-based. The former is a bit involved unless we introduce Spring Cloud Sleuth. Let's add annotations.

We need an additional dependency:

XML
 
<dependency>
    <groupId>io.opentelemetry.instrumentation</groupId>
    <artifactId>opentelemetry-instrumentation-annotations</artifactId>
    <version>1.17.0-alpha</version>
</dependency>


Be careful: the artifact was very recently relocated from io.opentelemetry:opentelemetry-extension-annotations.

We can now annotate our code:

Kotlin
 
@WithSpan("ProductHandler.fetch")                                               //1
suspend fun fetch(@SpanAttribute("id") id: Long): Result<Product> {             //2
    val product = repository.findById(id)
    return if (product == null) Result.failure(IllegalArgumentException("Product $id not found"))
    else Result.success(product)
}


#1: Add an additional span with the configured label.

#2: Use the parameter as an attribute, with the key set to id and the value the parameter's runtime value.

The Result!

We can now play with our simple demo to see the result:

Shell
 
curl localhost:9080/products
curl localhost:9080/products/1


The responses are not interesting, but let's look at the Jaeger UI. We find both traces, one per call:

Both traces (one per call) in Jaeger UI

We can dive into the spans of a single trace:

Spans of a single trace

Note that we can infer the sequence flow without the above UML diagram. Even better, the sequence displays the calls internal to a component.

Each span contains attributes that the automatic instrumentation added and the ones we added manually:

Attributes of each span

Conclusion

In this post, I've showcased tracing by following a request across an API gateway, two apps based on different tech stacks, and their respective databases. I've brushed only the surface of tracing: in the real world, tracing would probably involve components unrelated to HTTP, such as Kafka and message queues.

Still, most systems rely on HTTP in one way or another. While not trivial to set up, it's not too hard either. Tracing HTTP requests across components is a good start in your journey toward observability of your system.

The complete source code for this post can be found on GitHub.

To go further:

  • A beginner’s guide to OpenTelemetry
  • Python Automatic Instrumentation
  • Python Distro
  • Jaeger Getting Started
Desktop environment HTTPS IT Use case YAML Id (programming language) Instrumentation (computer programming) Python (language) Spring Boot Data Types

Published at DZone with permission of Nicolas Fränkel, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • What Is Pydantic?
  • The Power of ShardingSphere With Spring Boot
  • Keep Your Application Secrets Secret
  • Implementing Infinite Scroll in jOOQ

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!