Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Life of a Span

DZone's Guide to

The Life of a Span

A span is a single unit of work in distributed tracing. We'll use a sample Jaeger application to show the lifespan of a span and how it reach the backend.

Free Resource

Download Microservices for Java Developers: A hands-on introduction to frameworks and containers. Brought to you in partnership with Red Hat.

In the OpenTracing realm, as well as in other parts of the distributed tracing world, a “span” is the name given to the data structure that stores data related to a single unit of work. In most cases, developers just worry about instantiating a tracer and letting the instrumentation libraries capture interesting spans. How they actually reach a backend like Jaeger’s can be somewhat of a mystery.

Let’s try to clear out some of this magic.

For this article, let’s focus on what happens when we assume the defaults for all components involved. So you’ll have to remember that what actually happens in the background in your own implementation might differ from what we describe here, depending on your configuration.

We’ll use a sample application with the Jaeger Java Client, but other Jaeger client libraries for other languages act in a very similar manner.

The Instrumented Application’s Setup

The application used in this blog post is very simple and won’t register the tracer with the GlobalTracer as would be usual. Instead, it just brings an Eclipse Vert.x verticle up and creates a simple span for each HTTP request our handler receives. The code repository with this example is available on GitLab, but here’s the most relevant part:

package com.example.demo;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import io.jaegertracing.Configuration;
import io.opentracing.Scope;
import io.opentracing.Tracer;
import io.vertx.core.AbstractVerticle;
import io.vertx.core.Vertx;

public class MainVerticle extends AbstractVerticle {
    private static final Logger logger = LoggerFactory.getLogger(MainVerticle.class);

    public static void main(String[] args) {
        Vertx.vertx().deployVerticle(new MainVerticle());
    }

    @Override
    public void start() throws Exception {
        String serviceName = System.getenv("JAEGER_SERVICE_NAME");
        if (null == serviceName || serviceName.isEmpty()) {
            serviceName = "vertx-create-span";
        }

        System.setProperty("JAEGER_SERVICE_NAME", serviceName);
        Tracer tracer = Configuration.fromEnv().getTracer();

        vertx.createHttpServer().requestHandler(req -> {
            try (Scope ignored = tracer.buildSpan("operation").startActive(true)) {
                logger.debug("Request received");
                req.response().putHeader("content-type", "text/plain").end("Hello from Vert.x!");
            }
        }).listen(8080);

        logger.info("HTTP server started on port 8080");
    }
}


During the bootstrap, the Jaeger Java Client will build an instance of RemoteReporterbehind the scenes, which starts a daemon thread and is responsible for flushing the spans stored in the buffer (see JAEGER_REPORTER_FLUSH_INTERVAL).

This reporter will build an instance of the UdpSender, which just sends the captured span using Thrift via UDP to a Jaeger Agent running on localhost. Depending on the Tracer’s configuration, an HttpSender could have been used instead.

Span Genesis

Once an instrumentation library or the “business” code starts a span, the Jaeger Java Client will use the JaegerTracer.SpanBuilderto generate an instance of JaegerS. This instance includes a reference to a context object ( JaegerSpanContext), including a TraceIDand SpanID. Both hold the same value for our span, as it’s the root of the tree, also known as the “parent span”.

Span Reporting

Our instrumented code starts a span and does the required processing, like adding a specific HTTP header and writing the response to the client. Once that is done, the try-with-resourcesstatement will automatically call the close()method, which ends up calling JaegerSpan#finish(). The span is then delivered by the JaegerTracerto the RemoteReporter. At this point, this is what our span looks like:

JaegerSpan(
  tracer=JaegerTracer(
    version=Java-0.30.3-SNAPSHOT, 
    serviceName=vertx-create-span, 
    reporter=CompositeReporter(
      reporters=...
    ),
    sampler=ConstSampler(decision=true, tags={sampler.type=const, sampler.param=true}), 
    ipv4=-1062731773, 
    tags={hostname=caju.kroehling.de, jaeger.version=Java-0.30.3-SNAPSHOT, ip=192.168.0.3}, 
    ...
  ), 
  startTimeMicroseconds=1531757144093000, 
  startTimeNanoTicks=31909260878670, 
  computeDurationViaNanoTicks=true, 
  tags={sampler.type=const, sampler.param=true}, 
  durationMicroseconds=9888, 
  operationName=operation, 
  references=[], 
  context=73abf3be4c32c2b8:73abf3be4c32c2b8:0:1, 
  logs=null, 
  finished=true
)


The RemoteReporter will simply add the span to a queue and deliver the control back to the caller, so that no IO-blocking will ever occur that might negatively impact the actual application being traced. Needless to say, no more work happens in the “main” thread for this span.

Flush!

As soon as the span is in the queue,UdpSender#append(JaegerSpan) is called by the background thread. The sender will convert everyJaegerSpaninto a Thrift span before sending them over the wire to the agent. After the conversion, this is how the span looks like:

Span(
  traceIdLow:8335023534026965688, 
  traceIdHigh:0, 
  spanId:8335023534026965688, 
  parentSpanId:0, 
  operationName:operation, 
  references:[], 
  flags:1, 
  startTime:1531757144093000, 
  duration:9888, 
  tags:[
    Tag(key:sampler.type, vType:STRING, vStr:const), 
    Tag(key:sampler.param, vType:BOOL, vBool:true)
  ], 
  logs:[]
)

This span in Thrift format is added to a buffer, whose size is constantly tracked by the sender. Once the buffer approaches the maximum size of a UDP packet (about 65 KB) or some time has elapsed, the UdpSender flushes the list of spans, along with a Process object, representing the tracer process to UdpSender#send(Process, List<JaegerSpan>). This is the trigger for the UdpSender to emit a Thrift batch to the agent. For the curious ones out there, here’s how the batch looks like over the wire:

\ufffd\ufffd\u0001\temitBatch\u001c\u001c\u0018\u0011vertx-create-span\u0019<\u0018\u0008hostname\u0015\u0000\u0018\u0011caju.kroehling.de\u0000\u0018\u000ejaeger.version\u0015\u0000\u0018\u0014Java-0.30.3-SNAPSHOT\u0000\u0018\u0002ip\u0015\u0000\u0018\u000b192.168.0.3\u0000\u0000\u0019\u001c\u0016\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\u0001\u0016\u0000\u0016\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\u0001\u0016\u0000\u0018\toperation\u0019\u000c\u0015\u0002\u0016\ufffdť\ufffd\ufffdȸ\u0005\u0016\ufffd\ufffd\u0001\u0019,\u0018\u000csampler.type\u0015\u0000\u0018\u0005const\u0000\u0018\rsampler.param\u0015\u00041\u0000\u0019\u000c\u0000\u0000\u0000


Quick Appearance at the Agent

The Jaeger Agent is the daemon that runs very close to the instrumented application. Its sole purpose is to catch spans submitted from instrumented applications via UDP and relay them via a long-lived TChannel connection to the collector.

A batch, as received by the agent, contains two main properties: a Process, representing the metadata about the process where the Jaeger Tracer was running on the client, and a list of Spans. As the process metadata is the same for all spans in the same batch, it would potentially save some resources for batches with several spans. In our case, we have only one span and this is how it looks right before the agent dispatches it to the collector:

Batch({
  Process:Process({
    ServiceName:vertx-create-span 
    Tags:[
      Tag({Key:hostname VType:STRING VStr:0xc4203621b0 VDouble:<nil> VBool:<nil> VLong:<nil> VBinary:[]}) 
      Tag({Key:jaeger.version VType:STRING VStr:0xc420362260 VDouble:<nil> VBool:<nil> VLong:<nil> VBinary:[]}) 
      Tag({Key:ip VType:STRING VStr:0xc420362280 VDouble:<nil> VBool:<nil> VLong:<nil> VBinary:[]})
    ]
  }) 
  Spans:[
    Span({
      TraceIdLow:8335023534026965688
      TraceIdHigh:0 
      SpanId:8335023534026965688 
      ParentSpanId:0 
      OperationName:operation 
      References:[] 
      Flags:1 
      StartTime:1531757144093000 
      Duration:9888 
      Tags:[
        Tag({Key:sampler.type VType:STRING VStr:0xc4203622a0 VDouble:<nil> VBool:<nil> VLong:<nil> VBinary:[]}) 
        Tag({Key:sampler.param VType:BOOL VStr:<nil> VDouble:<nil> VBool:0xc42003ad5b VLong:<nil> VBinary:[]})
      ] 
      Logs:[]
    })
  ]
})


Reaching the Collector

After its quick appearance at the Agent, our span reaches the Jaeger Collector via the TChannel handler at SpanHandler#SubmitBatches, responsible for dealing with batches in Jaeger format. Other formats, such as Zipkin, would have different handlers.

Our batch will then be submitted to the collector and the pretty version of the payload would look like the following:

[
  {
    "process": {
      "serviceName": "vertx-create-span",
      "tags": [
        {
          "key": "hostname",
          "vType": "STRING",
          "vStr": "caju.kroehling.de"
        },
        {
          "key": "jaeger.version",
          "vType": "STRING",
          "vStr": "Java-0.30.3-SNAPSHOT"
        },
        {
          "key": "ip",
          "vType": "STRING",
          "vStr": "192.168.0.3"
        }
      ]
    },
    "spans": [
      {
        "traceIdLow": 8.335023534027e+18,
        "traceIdHigh": 0,
        "spanId": 8.335023534027e+18,
        "parentSpanId": 0,
        "operationName": "operation",
        "flags": 1,
        "startTime": 1.531757144093e+15,
        "duration": 9888,
        "tags": [
          {
            "key": "sampler.type",
            "vType": "STRING",
            "vStr": "const"
          },
          {
            "key": "sampler.param",
            "vType": "BOOL",
            "vBool": true
          }
        ]
      }
    ]
  }
]


The span handler will then build individual spans in Jaeger format, each one including a copy of the process object, and deliver the resulting list to a SpanProcessor#ProcessSpans.Our span has a different format now:
[
  {
    "trace_id": "AAAAAAAAAABzq\/O+TDLCuA==",
    "span_id": "c6vzvkwywrg=",
    "operation_name": "operation",
    "references": null,
    "flags": 1,
    "start_time": "2018-07-16T16:05:44.093Z",
    "duration": 9888000,
    "tags": [
      {
        "key": "sampler.type",
        "v_str": "const"
      },
      {
        "key": "sampler.param",
        "v_type": 1,
        "v_bool": true
      }
    ],
    "logs": null,
    "process": {
      "service_name": "vertx-create-span",
      "tags": [
        {
          "key": "hostname",
          "v_str": "caju.kroehling.de"
        },
        {
          "key": "jaeger.version",
          "v_str": "Java-0.30.3-SNAPSHOT"
        },
        {
          "key": "ip",
          "v_str": "192.168.0.3"
        }
      ]
    }
  }
]


At this stage, a span might go through a pre-processing routine and/or might be filtered out. Under normal conditions, though, spans will then reach SpanProcessor#saveSpan . If we had more spans in the batch, we’d see this method being called once for every span. A “Span Writer” will be employed, which can be a Cassandra, Elasticsearch, Kafka or in-memory span writer.

How it looks like in the concrete storage is left as an exercise to the reader, but a peek into the schema file for Cassandra or Elasticsearch might reveal quite a lot of details.

It’s worth noting that, from this point and on, we stop referring to our span as “span”: for almost all cases after this, it is treated as a fully fledged trace, that happens to be composed by a single span.

The UI and the Query

At this point, our trace is in storage, ready to be retrieved by the UI via the query component.

Under the Jaeger UI, traces can be retrieved based on search terms such as the service name, “vertx-create-span” in our case. On the backend side, the Jaeger Query component will see the following as the search terms when we select the service name and click “Find Traces”:

{
  "ServiceName": "vertx-create-span",
  "OperationName": "",
  "Tags": {

  },
  "StartTimeMin": "2018-07-16T17:06:38.426+02:00",
  "StartTimeMax": "2018-07-16T18:06:38.426+02:00",
  "DurationMin": 0,
  "DurationMax": 0,
  "NumTraces": 20
}


The Query’s APIHandler#search method will parse the search terms and pass it over to the storage-specific “Span Reader”. Based on the service name, our trace is then found and added to a result list. The backend sees this list as:

{
  "traceID": "73abf3be4c32c2b8",
  "spans": [
    {
      "traceID": "73abf3be4c32c2b8",
      "spanID": "73abf3be4c32c2b8",
      "flags": 1,
      "operationName": "operation",
      "references": [

      ],
      "startTime": 1.531757144093e+15,
      "duration": 9888,
      "tags": [
        {
          "key": "sampler.type",
          "type": "string",
          "value": "const"
        },
        {
          "key": "sampler.param",
          "type": "bool",
          "value": true
        }
      ],
      "logs": [

      ],
      "processID": "p1",
      "warnings": null
    }
  ],
  "processes": {
    "p1": {
      "serviceName": "vertx-create-span",
      "tags": [
        {
          "key": "hostname",
          "type": "string",
          "value": "caju.kroehling.de"
        },
        {
          "key": "ip",
          "type": "string",
          "value": "192.168.0.3"
        },
        {
          "key": "jaeger.version",
          "type": "string",
          "value": "Java-0.30.3-SNAPSHOT"
        }
      ]
    }
  },
  "warnings": null
}


All messages from the backend to the UI follow a specific format, so, this is what the UI ends up receiving:

{
  "data": [
    {
      "traceID": "73abf3be4c32c2b8",
      "spans": [
        {
          "traceID": "73abf3be4c32c2b8",
          "spanID": "73abf3be4c32c2b8",
          "flags": 1,
          "operationName": "operation",
          "references": [

          ],
          "startTime": 1.531757144093e+15,
          "duration": 9888,
          "tags": [
            {
              "key": "sampler.type",
              "type": "string",
              "value": "const"
            },
            {
              "key": "sampler.param",
              "type": "bool",
              "value": true
            }
          ],
          "logs": [

          ],
          "processID": "p1",
          "warnings": null
        }
      ],
      "processes": {
        "p1": {
          "serviceName": "vertx-create-span",
          "tags": [
            {
              "key": "hostname",
              "type": "string",
              "value": "caju.kroehling.de"
            },
            {
              "key": "ip",
              "type": "string",
              "value": "192.168.0.3"
            },
            {
              "key": "jaeger.version",
              "type": "string",
              "value": "Java-0.30.3-SNAPSHOT"
            }
          ]
        }
      },
      "warnings": null
    }
  ],
  "total": 0,
  "limit": 0,
  "offset": 0,
  "errors": null
}


The Jaeger UI will iterate over the results and nicely render the information on the screen:

In most cases, the UI won’t request the trace information again upon clicking on the trace, but if we open the trace and hit “Refresh,” it causes the UI to do a request that will reachApiHandler#getTrace.It loads the trace based on the given ID from the span storage along with all its spans, responding with a data structure similar to the following:

{
  "spans": [
    {
      "trace_id": "AAAAAAAAAABzq\/O+TDLCuA==",
      "span_id": "c6vzvkwywrg=",
      "operation_name": "operation",
      "references": null,
      "flags": 1,
      "start_time": "2018-07-16T16:05:44.093Z",
      "duration": 9888000,
      "tags": [
        {
          "key": "sampler.type",
          "v_str": "const"
        },
        {
          "key": "sampler.param",
          "v_type": 1,
          "v_bool": true
        }
      ],
      "logs": null,
      "process": {
        "service_name": "vertx-create-span",
        "tags": [
          {
            "key": "hostname",
            "v_str": "caju.kroehling.de"
          },
          {
            "key": "ip",
            "v_str": "192.168.0.3"
          },
          {
            "key": "jaeger.version",
            "v_str": "Java-0.30.3-SNAPSHOT"
          }
        ]
      }
    }
  ],
  "process_map": null
}


Because we have only one trace, with only one span, the payload the UI receives for this request is exactly the same as the one we got from the “search” operation. But the way this data is presented differs:

The detail view of a trace

Afterlife

We’ve covered pretty much all stages of the span’s life, from its genesis up to where it’s finally used to provide insights about the instrumented application. From this point, a span might appear in several afterlife scenarios, like as a data point in Prometheus, or aggregated with other spans in a Grafana dashboard somewhere. Eventually, the storage owner might decide to clean up older data, causing our span to cease to exist and closing the cycle.

Download Building Reactive Microservices in Java: Asynchronous and Event-Based Application Design. Brought to you in partnership with Red Hat

Topics:
opentracing ,jaeger ,microservices ,distributed tracing ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}