A Practical Guide to OpenTelemetry With Spring Boot Workloads

This tutorial demonstrates setting up OpenTelemetry with Spring Boot for observability, including metrics, traces, and logs, using tools like Grafana, Loki, and Tempo.

Víctor Orozco

Apr. 08, 25 · Tutorial

Likes (5)

Comment

Save

3.8K Views

In this tutorial, we consolidated some practical approaches regarding OpenTelemetry and how to use it with Spring Boot. This tutorial is composed of four primary sections:

OpenTelemetry practical concepts
Setting up an observability stack with OpenTelemetry Collector, Grafana, Loki, Tempo, and Podman
Instrumenting Spring Boot applications for OpenTelemetry
Testing and E2E sample

By the end of the tutorial, you should be able to implement the following architecture:

OpenTelemetry Practical Concepts

As the official documentation states, OpenTelemetry is:

An observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs
Vendor and tool-agnostic, meaning that it can be used with a broad variety of Observability backends.
Focused on the generation, collection, management, and export of telemetry. A major goal of OpenTelemetry is that you can easily instrument your applications or systems, no matter their language, infrastructure, or runtime environment.

Monitoring, Observability, and METL

To keep things short, monitoring is the process of collecting, processing, and analyzing data to track the state of a (information) system. Then, monitoring is going to the next level, to actually understand the information that is being collected and do something with it, like defining alerts for a given system.

To achieve both goals, it is necessary to collect three dimensions of data, specifically:

Logs: Registries about processes and applications, with useful data like timestamps and context
Metrics: Numerical data about the performance of applications and application modules
Traces: Data that allow to estabilish the complete route that a given operation traverses through a series of dependent applications

Hence, when the state of a given system is altered in some way, we have an Event, which correlates and ideally generates data on the three dimensions.

Why Is OpenTelemetry Important, and What Problem Does It Solve?

Developers recognize by experience that monitoring and observability are important, either to evaluate the actual state of a system or to do post-mortem analysis after disasters. Hence, it is natural to think that observability has been implemented in various ways. For example, if we think of a system constructed with Java, we have at least the following collection points:

Logs: Systemd, /var/log, /opt/tomcat, FluentD
Metrics: Java metrics via JMX, OS Metrics, vendor specific metrics via Spring Actuator
Tracing: Data via Jaeger or Zipkin tooling in our Java workloads

This variety in turn imposes a great amount of complexity in instrumenting our systems to provide information, that a- comes in different formats, from b- technology that is difficult to implement, often with c- solutions that are too tied to a given provider or in the worst cases, d- technologies that only work with certain languages/frameworks.

And that's the magic about the OpenTelemetry proposal: by creating a working group under the CNCF umbrella the project can provide useful things like:

Common protocols that vendors and communities can implement to talk to each other
Standards for software communities to implement instrumentation in libraries and frameworks to provide data in OpenTelemetry format
A collector able to retrieve/receive data from diverse origins compatible with OpenTelemetry, process it and send it to...
Analysis platforms, databases, and cloud vendors able to receive the data and provide added value over it

In short, OpenTelemetry is the reunion of various great monitoring ideas that overlapping software communities can implement to facilitate the burden of monitoring implementations.

OpenTelemetry Data Pipeline

For me, the easiest way to think about OpenTelemetry concepts is a data pipeline, in this data pipeline you need to

Instrument your workloads to push (or offer) the telemetry data to a processing/collecting element — i.e., OpenTelemetry Collector-
Configure OpenTelemetry Collector to receive or pull the data from diverse workloads
Configure OpenTelemetry Collector to process the data — i.e., adding special tags, filtering data
Configure OpenTelemetry Collector to push (or offer) the data to compatible backends
Configure and use the backends to receive (or pull) the data from the collector to allow analysis, alarms, AI... pretty much any case that you can think about with data

Setting up an observability stack with OpenTelemetry Collector, Grafana, Prometheus, Loki, Tempo and Podman

As OpenTelemetry got popular various vendors have implemented support for it, to mention a few:

Self-hosted platforms

Cloud platforms

Hence, for development purposes, it is always useful to know how to bootstrap a quick observability stack able to receive and show OpenTelemetry capabilities.

For this purpose, we will use the following elements:

Prometheus as a time-series database for metrics
Loki as a logs platform
Tempo as a tracing platform
Grafana as a web UI

And of course OpenTelemetry collector. This example is based on various Grafana examples, with a little bit of tweaking to demonstrate the different ways of collecting, processing and sending data to backends.

OpenTelemetry Collector

As stated previously, OpenTelemetry collector acts as an intermediary that receives/pull information from data sources, processes this information and, forwards the information to destinations like analysis platforms or even other collectors. The collector is able to do this either with compliant workloads or via plugins that talk with the workloads using proprietary formats.

As the plugins collection can be increased or decreased, vendors have created their own distributions of OpenTelemetry collectors, for reference I've used successfully in the real world:

You can find a complete list directly on the OpenTelemetry website.

For this demonstration, we will create a data pipeline using the contrib version of the reference implementation, which provides a good amount of receivers, exporters, and processors. In our case, Otel configuration is designed to:

Receive data from Spring Boot workloads (ports 4317 and 4318)
Process the data, adding a new tag to metrics
Expose an endpoint for Prometheus scraping (port 8889)
Send logs to Loki (port 3100) using otlphttp format
Send traces to Tempo (port 9411) using otlp format
Exposes a rudimentary dashboard from the collector, called zpages. Very useful for debugging.

otel-config.yaml

    YAML
   
 

   receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
processors:
  attributes:
    actions:
      - key: team
        action: insert
        value: vorozco
exporters:
  debug:
  prometheus:
    endpoint: "0.0.0.0:8889"
  otlphttp:
    endpoint: http://loki:3100/otlp
  otlp:
    endpoint: tempo:4317
    tls:
      insecure: true
service:
  extensions: [zpages]
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [attributes]
      exporters: [debug,prometheus]
    traces:
      receivers: [otlp]
      exporters: [debug, otlp]
    logs:
      receivers: [otlp]
      exporters: [debug, otlphttp]
extensions:
  zpages:
    endpoint: "0.0.0.0:55679"
  

Prometheus

Prometheus is a well known analysis platform, that among other things offers dimensional data and a performant time-series storage.

By default, it works as a metrics scrapper, then, workloads provide a http endpoint offering data using the Prometheus format. For our example, we configured Otel to offer metrics to the prometheus host via port 8889.

    YAML
   
   prometheus:
    endpoint: "prometheus:8889"

Then, whe need to configure Prometheus to scrape the metrics from the Otel host. You would notice two ports, the one that we defined for the active workload data (8889) and another for metrics data for the collector itself (8888).

prometheus.yml

    YAML
   
 

   scrape_configs:
- job_name: "otel"
  scrape_interval: 10s
  static_configs:
    - targets: ["otel:8889"]
    - targets: ["otel:8888"]
  

It is worth highlighting that Prometheus also offers a way to ingest information instead of scrapping it, and, the official support for OpenTelemetry ingestion is coming on the new versions.

Loki

As described in the website, Loki is a specific solution for log aggregation heavily inspired by Prometheus, with the particular design decision to NOT format in any way the log contents, leaving that responsibility to the query system.

To configure the project for local environments, the project offers a configuration that is usable for most of the development purposes. The following configuration is an adaptation to preserve the bare minimum to work with temporal files and memory.

loki.yaml

    YAML
   
 

   auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093

limits_config:
  allow_structured_metadata: true
  

Then, we configure an exporter to deliver the data to the loki host using oltphttp format.

    YAML
   
   otlphttp:
  endpoint: http://loki:3100/otlp

Tempo

In similar fashion than Loki, Tempo is an Open Source project created by grafana that aims to provide a distributed tracing backend. On a personal note, for me besides performance it shines for being compatible not only with OpenTelemetry, it can also ingest data in Zipkin and Jaeger formats.

To configure the project for local environments, the project offers a configuration that is usable for most of the development purposes. The following configuration is an adaptation to remove the metrics generation and simplify the configuration, however with this we loose the service graph feature.

tempo.yaml

    YAML
   
 

   stream_over_http_enabled: true
server:
  http_listen_port: 3200
  log_level: info

query_frontend:
  search:
    duration_slo: 5s
    throughput_bytes_slo: 1.073741824e+09
    metadata_slo:
      duration_slo: 5s
      throughput_bytes_slo: 1.073741824e+09
  trace_by_id:
    duration_slo: 5s

distributor:
  receivers:
    otlp:
      protocols:
        http:
        grpc:

ingester:
  max_block_duration: 5m               # cut the headblock when this much time passes. this is being set for demo purposes and should probably be left alone normally

compactor:
  compaction:
    block_retention: 1h                # overall Tempo trace retention. set for demo purposes

storage:
  trace:
    backend: local                     # backend configuration to use
    wal:
      path: /var/tempo/wal             # where to store the wal locally
    local:
      path: /var/tempo/blocks
  

Then, we configure an exporter to deliver the data to Tempo host using oltp/grpc format.

    YAML
   
   otlp:
    endpoint: tempo:4317
    tls:
      insecure: true

Grafana

Loki, Tempo and (to some extent) Prometheus are data storages, but we still need to show this data to the user. Here, Grafana enters the scene.

Grafana offers a good selection of analysis tools, plugins, dashboards, alarms, connectors and a great community that empowers observability. Besides having a great compatibility with Prometheus, it offers of course a perfect compatibility with their other offerings.

To configure Grafana, you just need to plug compatible datasources, and the rest of the work will be on the web ui.

grafana.yaml

    YAML
   
 

   apiVersion: 1

datasources:
  - name: Otel-Grafana-Example
    type: prometheus
    url: http://prometheus:9090
    editable: true
  - name: Loki
    type: loki
    access: proxy
    orgId: 1
    url: http://loki:3100
    basicAuth: false
    isDefault: true
    version: 1
    editable: false
  - name: Tempo
    type: tempo
    access: proxy
    orgId: 1
    url: http://tempo:3200
    basicAuth: false
    version: 1
    editable: false
    apiVersion: 1
    uid: tempo
  

Podman (or Docker)

At this point, you may have noticed that I've referred to the backends using single names. This is because I intend to set these names using a Podman Compose deployment.

otel-compose.yml

    YAML
   
 

   version: '3'
services:
  otel:
    container_name: otel
    image: otel/opentelemetry-collector-contrib:latest
    command: [--config=/etc/otel-config.yml]
    volumes:
      - ./otel-config.yml:/etc/otel-config.yml
    ports:
      - "4318:4318"
      - "4317:4317"
      - "55679:55679"
  prometheus:
    container_name: prometheus
    image: prom/prometheus
    command: [--config.file=/etc/prometheus/prometheus.yml]
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9091:9090"
  grafana:
    container_name: grafana
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    image: grafana/grafana
    volumes:
      - ./grafana.yml:/etc/grafana/provisioning/datasources/default.yml
    ports:
      - "3000:3000"
  loki:
    container_name: loki
    image: grafana/loki:3.2.0
    command: -config.file=/etc/loki/local-config.yaml
    volumes:
      - ./loki.yaml:/etc/loki/local-config.yaml
    ports:
      - "3100"
  tempo:
    container_name: tempo
    image: grafana/tempo:latest
    command: [ "-config.file=/etc/tempo.yaml" ]
    volumes:
      - ./tempo.yaml:/etc/tempo.yaml
    ports:
      - "4317"  # otlp grpc
      - "4318"
  

At this point, the compose description is pretty self-descriptive, but I would like to highlight some things:

Some ports are open to the host -e.g. 4318:4318 - while others are closed to the default network that compose will be created among containers -e.g. 3100-
This stack is designed to avoid any permanent data. Again, this is my personal way to boot quickly an observability stack to allow tests during deployment. To make it ready for production, you probably would want to preserve the data in some volumes

Once the configuration is ready, you can launch it using the compose file

    Shell
   
   cd podman
podman compose -f otel-compose.yml up

If the configuration is ok, you should have five containers running without errors.

Instrumenting Spring Boot Applications for OpenTelemetry

As part of my daily activities, I was in charge of a major implementation of all these concepts. Hence, it was natural for me to create a proof of concept that you can find at my GitHub.

For demonstration purposes, we have two services with different HTTP endpoints:

springboot-demo:8080- Useful to demonstrate local and database tracing, performance, logs and OpenTelemetry instrumentation
- /books — A books CRUD using Spring Data
- /fibo — A Naive Fibonacci implementation that generates CPU load and delays
- /log — Which generate log messages using the different SLF4J levels
springboot-client-demo:8081- Useful to demonstrate tracing capabilities, Micrometer instrumentation and Micrometer Tracing instrumentation
- /trace-demo - A quick OpenFeing client that invokes books GetAll Books demo

Instrumentation Options

Given the popularity of OpenTelemetry, developers can also expect multiple instrumentation options.

First of all, the OpenTelemetry project offers a framework-agnostic instrumentation that uses bytecode manipulation, for this instrumentation to work you need to include a Java Agent via Java Classpath. In my experience this instrumentation is preferred if you don't control the workload or if your platform does not offer OpenTelemetry support at all.

However, instrumentation of workloads can become really specific — e.g. instrumentation of a Database pool given a particular IoC mechanism. For this, the Java world provides a good ecosystem, for example:

And, of course, Spring Boot.

Spring Boot is a special case with TWO major instrumentation options

Both options use Spring concepts like decorators and interceptors to capture and send information to the destinations. The only rule is to create the clients/services/objects in the Spring way (hence via Spring IoC).

I've used both successfully, and my heavily opinionated conclusion is the following:

Micrometer collects more information about spring metrics. Besides the OpenTelemetry backend, it supports a plethora of backends directly without any collector intervention. If you cannot afford a collector, this is the way. From a Micrometer perspective, OpenTelemetry is just another backend.
Micrometer Tracing is the evolution of Spring Cloud Sleuth, hence, if you have workloads with Spring Boot 2 and 3, you have to support both tools (or maybe migrate everything to Spring boot 3?)
The Micrometer family does not offer a way to collect logs and send these to a backend, hence devs have to solve this by using an appender specific to your logging library. On the other hand OpenTelemetry Spring Boot starter offers this out of the box if you use Spring Boot default (SLF4J over Logback)

As these libraries are mutually exclusive, if the decision were mine, I would pick OpenTelemetry's Spring Boot starter. It offers logs support OOB and also a bridge for micrometer Metrics.

Instrumenting springboot-demo With OpenTelemetry SpringBoot Starter

As always, it is also good to consider the official documentation.

Otel instrumentation with the Spring started is activated in three steps:

You need to include both OpenTelemetry Bom and OpenTelemetry dependency. If you are planning to also use micrometer metrics, it is also a good idea to include Spring Actuator

    XML
   
 

   <dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>io.opentelemetry.instrumentation</groupId>
            <artifactId>opentelemetry-instrumentation-bom</artifactId>
            <version>2.10.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>
...
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.opentelemetry.instrumentation</groupId>
    <artifactId>opentelemetry-spring-boot-starter</artifactId>
</dependency>
  

There is a set of optional libraries and adapters that you can configure if your workloads already diverged from the "Spring Way"
You need to activate (or not) the dimensions of observability (metrics, traces and logs). Also, you can fine-tune the exporting parameters like ports, URLs, or exporting periods. Either by using Spring Properties or env variables

    Properties files
   
 

   #Configure exporters
otel.logs.exporter=otlp
otel.metrics.exporter=otlp
otel.traces.exporter=otlp

#Configure metrics generation
otel.metric.export.interval=5000 #Export metrics each five seconds
otel.instrumentation.micrometer.enabled=true #Enabe Micrometer metrics bridge
  

Instrumenting springboot-client-demo With Micrometer and Micrometer Tracing

Again, this instrumentation does not support logs exporting. Also, it is a good idea to check the latest documentation for Micrometer and Micrometer Tracing.

As in the previous example, you need to enable the Spring Actuator (which includes Micrometer). As OpenTelemetry is just a backend from Micrometer's perspective, so you just need to enable the corresponding OTLP registry, which will export metrics to localhost by default.

    XML
   
 

   <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-otlp</artifactId>
</dependency>
  

In a similar way, once Actuator is enabled, you just need to add support for the tracing backend.

    XML
   
   <dependency>
  <groupId>io.micrometer</groupId>
  <artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>

Finally, you can fine-tune the configuration using Spring properties. For example, you can decide if 100% of traces are reported or how often the metrics are reported to the backend.

    Properties files
   
   management.otlp.tracing.endpoint=http://localhost:4318/v1/traces
management.otlp.tracing.timeout=10s
management.tracing.sampling.probability=1

management.otlp.metrics.export.url=http://localhost:4318/v1/metrics
management.otlp.metrics.export.step=5s
management.opentelemetry.resource-attributes."service-name"=${spring.application.name}

Testing and E2E Sample

Generating Workload Data

The POC provides the following structure

├── podman # Podman compose config files
├── springboot-client-demo #Spring Boot Client instrumented with Actuator, Micrometer and MicroMeter tracing 
└── springboot-demo #Spring Boot service instrumented with OpenTelemetry Spring Boot Starter

The first step is to boot the observability stack we created previously.

    Shell
   
   cd podman
podman compose -f otel-compose.yml up

This will provide you with an instance of Grafana on port 3000

Then, it is time to boot the first service!. You only need Java 21 on the active shell:

    Shell
   
   cd springboot-demo
mvn spring-boot:run

If the workload is properly configured, you will see the following information on the OpenTelemetry container standard output. Which basically says you are successfully reporting data.

    Shell
   
 

   [otel]       | 2024-12-01T22:10:07.730Z info    Logs    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 24}
[otel]       | 2024-12-01T22:10:10.671Z info    Metrics {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 64, "data points": 90}
[otel]       | 2024-12-01T22:10:10.672Z info    Traces  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 5}
[otel]       | 2024-12-01T22:10:15.691Z info    Metrics {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 65, "data points": 93}
[otel]       | 2024-12-01T22:10:15.833Z info    Metrics {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 65, "data points": 93}
[otel]       | 2024-12-01T22:10:15.835Z info    Logs    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 5}
  

The data is being reported over the OpenTelemetry ports (4317 and 4318), which are open from Podman to the host. By default, all telemetry libraries report to localhost, but this can be configured for other cases like FaaS or Kubernetes.

Also, you can verify the reporting status in ZPages.

Finally, let's do the same withthe Spring Boot client:

    Shell
   
   cd springboot-client-demo
mvn spring-boot:run

As described in the previous section, I created a set of interactions to:

Generate CPU workload using Naive fibonacci:

    Shell
   
   curl http://localhost:8080/fibo\?n\=45

Generate logs in different levels:

    Shell
   
   curl http://localhost:8080/fibo\?n\=45

Persist data using a CRUD:

    Shell
   
 

   curl -X POST --location "http://localhost:8080/books" \
-H "Content-Type: application/json" \
-d '{
"author": "Miguel Angel Asturias",
"title": "El señor presidente",
"isbn": "978-84-376-0494-7",
"publisher": "Editorial planeta"
}'
  

And then retrieve the data using a secondary service:

    Shell
   
   curl http://localhost:8081/trace-demo

This asciicast shows the interaction:

https://asciinema.org/a/692968

Grafana Results

Once the data is accessible by Grafana, the what to do with data is up to you. Again, you could:

Create dashboards
Configure alarms
Configure notifications from alarms

The quickest way to verify if the data is reported correctly is to verify directly in Grafana explore.

First, we can check some metrics like system_cpu_usage and filter by service name. In this case I used springboot-demo which has the CPU demo using naive fibonacci, I can even filter by my own tag (which was added by Otel processor):

In the same way, logs are already stored in Loki:

Finally, we could check the whole trace, including both services and interaction with H2 RDBMS:

Conclusion

In conclusion, implementing OpenTelemetry with Spring Boot provides a robust solution for observability, offering comprehensive insights through metrics, traces, and logs. By integrating tools like Prometheus, Grafana, Tempo, and Loki, you can easily monitor and troubleshoot your applications. Whether you choose OpenTelemetry's Spring Boot starter or Micrometer, both provide powerful instrumentation options, allowing you to gain deeper visibility and improve application performance.

Grafana Loki (C++) Spring Boot

Published at DZone with permission of Víctor Orozco. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending