DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations

The Life of a Span

A span is a single unit of work in distributed tracing. We'll use a sample Jaeger application to show the lifespan of a span and how it reach the backend.

Juraci Paixao Kroehling user avatar by
Juraci Paixao Kroehling
·
Jul. 27, 18 · Tutorial
Like (3)
Save
Tweet
Share
8.05K Views

Join the DZone community and get the full member experience.

Join For Free

In the OpenTracing realm, as well as in other parts of the distributed tracing world, a “span” is the name given to the data structure that stores data related to a single unit of work. In most cases, developers just worry about instantiating a tracer and letting the instrumentation libraries capture interesting spans. How they actually reach a backend like Jaeger’s can be somewhat of a mystery.

Let’s try to clear out some of this magic.

For this article, let’s focus on what happens when we assume the defaults for all components involved. So you’ll have to remember that what actually happens in the background in your own implementation might differ from what we describe here, depending on your configuration.

We’ll use a sample application with the Jaeger Java Client, but other Jaeger client libraries for other languages act in a very similar manner.

The Instrumented Application’s Setup

The application used in this blog post is very simple and won’t register the tracer with the GlobalTracer as would be usual. Instead, it just brings an Eclipse Vert.x verticle up and creates a simple span for each HTTP request our handler receives. The code repository with this example is available on GitLab, but here’s the most relevant part:

package com.example.demo;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import io.jaegertracing.Configuration;
import io.opentracing.Scope;
import io.opentracing.Tracer;
import io.vertx.core.AbstractVerticle;
import io.vertx.core.Vertx;

public class MainVerticle extends AbstractVerticle {
    private static final Logger logger = LoggerFactory.getLogger(MainVerticle.class);

    public static void main(String[] args) {
        Vertx.vertx().deployVerticle(new MainVerticle());
    }

    @Override
    public void start() throws Exception {
        String serviceName = System.getenv("JAEGER_SERVICE_NAME");
        if (null == serviceName || serviceName.isEmpty()) {
            serviceName = "vertx-create-span";
        }

        System.setProperty("JAEGER_SERVICE_NAME", serviceName);
        Tracer tracer = Configuration.fromEnv().getTracer();

        vertx.createHttpServer().requestHandler(req -> {
            try (Scope ignored = tracer.buildSpan("operation").startActive(true)) {
                logger.debug("Request received");
                req.response().putHeader("content-type", "text/plain").end("Hello from Vert.x!");
            }
        }).listen(8080);

        logger.info("HTTP server started on port 8080");
    }
}


During the bootstrap, the Jaeger Java Client will build an instance of RemoteReporterbehind the scenes, which starts a daemon thread and is responsible for flushing the spans stored in the buffer (see JAEGER_REPORTER_FLUSH_INTERVAL).

This reporter will build an instance of the UdpSender, which just sends the captured span using Thrift via UDP to a Jaeger Agent running on localhost. Depending on the Tracer’s configuration, an HttpSender could have been used instead.

Span Genesis

Once an instrumentation library or the “business” code starts a span, the Jaeger Java Client will use the JaegerTracer.SpanBuilderto generate an instance of JaegerS. This instance includes a reference to a context object ( JaegerSpanContext), including a TraceIDand SpanID. Both hold the same value for our span, as it’s the root of the tree, also known as the “parent span”.

Span Reporting

Our instrumented code starts a span and does the required processing, like adding a specific HTTP header and writing the response to the client. Once that is done, the try-with-resourcesstatement will automatically call the close()method, which ends up calling JaegerSpan#finish(). The span is then delivered by the JaegerTracerto the RemoteReporter. At this point, this is what our span looks like:

JaegerSpan(
  tracer=JaegerTracer(
    version=Java-0.30.3-SNAPSHOT, 
    serviceName=vertx-create-span, 
    reporter=CompositeReporter(
      reporters=...
    ),
    sampler=ConstSampler(decision=true, tags={sampler.type=const, sampler.param=true}), 
    ipv4=-1062731773, 
    tags={hostname=caju.kroehling.de, jaeger.version=Java-0.30.3-SNAPSHOT, ip=192.168.0.3}, 
    ...
  ), 
  startTimeMicroseconds=1531757144093000, 
  startTimeNanoTicks=31909260878670, 
  computeDurationViaNanoTicks=true, 
  tags={sampler.type=const, sampler.param=true}, 
  durationMicroseconds=9888, 
  operationName=operation, 
  references=[], 
  context=73abf3be4c32c2b8:73abf3be4c32c2b8:0:1, 
  logs=null, 
  finished=true
)


The RemoteReporter will simply add the span to a queue and deliver the control back to the caller, so that no IO-blocking will ever occur that might negatively impact the actual application being traced. Needless to say, no more work happens in the “main” thread for this span.

Flush!

As soon as the span is in the queue,UdpSender#append(JaegerSpan) is called by the background thread. The sender will convert everyJaegerSpaninto a Thrift span before sending them over the wire to the agent. After the conversion, this is how the span looks like:

Span(
  traceIdLow:8335023534026965688, 
  traceIdHigh:0, 
  spanId:8335023534026965688, 
  parentSpanId:0, 
  operationName:operation, 
  references:[], 
  flags:1, 
  startTime:1531757144093000, 
  duration:9888, 
  tags:[
    Tag(key:sampler.type, vType:STRING, vStr:const), 
    Tag(key:sampler.param, vType:BOOL, vBool:true)
  ], 
  logs:[]
)

This span in Thrift format is added to a buffer, whose size is constantly tracked by the sender. Once the buffer approaches the maximum size of a UDP packet (about 65 KB) or some time has elapsed, the UdpSender flushes the list of spans, along with a Process object, representing the tracer process to UdpSender#send(Process, List<JaegerSpan>). This is the trigger for the UdpSender to emit a Thrift batch to the agent. For the curious ones out there, here’s how the batch looks like over the wire:

\ufffd\ufffd\u0001\temitBatch\u001c\u001c\u0018\u0011vertx-create-span\u0019<\u0018\u0008hostname\u0015\u0000\u0018\u0011caju.kroehling.de\u0000\u0018\u000ejaeger.version\u0015\u0000\u0018\u0014Java-0.30.3-SNAPSHOT\u0000\u0018\u0002ip\u0015\u0000\u0018\u000b192.168.0.3\u0000\u0000\u0019\u001c\u0016\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\u0001\u0016\u0000\u0016\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\u0001\u0016\u0000\u0018\toperation\u0019\u000c\u0015\u0002\u0016\ufffdť\ufffd\ufffdȸ\u0005\u0016\ufffd\ufffd\u0001\u0019,\u0018\u000csampler.type\u0015\u0000\u0018\u0005const\u0000\u0018\rsampler.param\u0015\u00041\u0000\u0019\u000c\u0000\u0000\u0000


Quick Appearance at the Agent

The Jaeger Agent is the daemon that runs very close to the instrumented application. Its sole purpose is to catch spans submitted from instrumented applications via UDP and relay them via a long-lived TChannel connection to the collector.

A batch, as received by the agent, contains two main properties: a Process, representing the metadata about the process where the Jaeger Tracer was running on the client, and a list of Spans. As the process metadata is the same for all spans in the same batch, it would potentially save some resources for batches with several spans. In our case, we have only one span and this is how it looks right before the agent dispatches it to the collector:

Batch({
  Process:Process({
    ServiceName:vertx-create-span 
    Tags:[
      Tag({Key:hostname VType:STRING VStr:0xc4203621b0 VDouble:<nil> VBool:<nil> VLong:<nil> VBinary:[]}) 
      Tag({Key:jaeger.version VType:STRING VStr:0xc420362260 VDouble:<nil> VBool:<nil> VLong:<nil> VBinary:[]}) 
      Tag({Key:ip VType:STRING VStr:0xc420362280 VDouble:<nil> VBool:<nil> VLong:<nil> VBinary:[]})
    ]
  }) 
  Spans:[
    Span({
      TraceIdLow:8335023534026965688
      TraceIdHigh:0 
      SpanId:8335023534026965688 
      ParentSpanId:0 
      OperationName:operation 
      References:[] 
      Flags:1 
      StartTime:1531757144093000 
      Duration:9888 
      Tags:[
        Tag({Key:sampler.type VType:STRING VStr:0xc4203622a0 VDouble:<nil> VBool:<nil> VLong:<nil> VBinary:[]}) 
        Tag({Key:sampler.param VType:BOOL VStr:<nil> VDouble:<nil> VBool:0xc42003ad5b VLong:<nil> VBinary:[]})
      ] 
      Logs:[]
    })
  ]
})


Reaching the Collector

After its quick appearance at the Agent, our span reaches the Jaeger Collector via the TChannel handler at SpanHandler#SubmitBatches, responsible for dealing with batches in Jaeger format. Other formats, such as Zipkin, would have different handlers.

Our batch will then be submitted to the collector and the pretty version of the payload would look like the following:

[
  {
    "process": {
      "serviceName": "vertx-create-span",
      "tags": [
        {
          "key": "hostname",
          "vType": "STRING",
          "vStr": "caju.kroehling.de"
        },
        {
          "key": "jaeger.version",
          "vType": "STRING",
          "vStr": "Java-0.30.3-SNAPSHOT"
        },
        {
          "key": "ip",
          "vType": "STRING",
          "vStr": "192.168.0.3"
        }
      ]
    },
    "spans": [
      {
        "traceIdLow": 8.335023534027e+18,
        "traceIdHigh": 0,
        "spanId": 8.335023534027e+18,
        "parentSpanId": 0,
        "operationName": "operation",
        "flags": 1,
        "startTime": 1.531757144093e+15,
        "duration": 9888,
        "tags": [
          {
            "key": "sampler.type",
            "vType": "STRING",
            "vStr": "const"
          },
          {
            "key": "sampler.param",
            "vType": "BOOL",
            "vBool": true
          }
        ]
      }
    ]
  }
]


The span handler will then build individual spans in Jaeger format, each one including a copy of the process object, and deliver the resulting list to a SpanProcessor#ProcessSpans.Our span has a different format now:
[
  {
    "trace_id": "AAAAAAAAAABzq\/O+TDLCuA==",
    "span_id": "c6vzvkwywrg=",
    "operation_name": "operation",
    "references": null,
    "flags": 1,
    "start_time": "2018-07-16T16:05:44.093Z",
    "duration": 9888000,
    "tags": [
      {
        "key": "sampler.type",
        "v_str": "const"
      },
      {
        "key": "sampler.param",
        "v_type": 1,
        "v_bool": true
      }
    ],
    "logs": null,
    "process": {
      "service_name": "vertx-create-span",
      "tags": [
        {
          "key": "hostname",
          "v_str": "caju.kroehling.de"
        },
        {
          "key": "jaeger.version",
          "v_str": "Java-0.30.3-SNAPSHOT"
        },
        {
          "key": "ip",
          "v_str": "192.168.0.3"
        }
      ]
    }
  }
]


At this stage, a span might go through a pre-processing routine and/or might be filtered out. Under normal conditions, though, spans will then reach SpanProcessor#saveSpan . If we had more spans in the batch, we’d see this method being called once for every span. A “Span Writer” will be employed, which can be a Cassandra, Elasticsearch, Kafka or in-memory span writer.

How it looks like in the concrete storage is left as an exercise to the reader, but a peek into the schema file for Cassandra or Elasticsearch might reveal quite a lot of details.

It’s worth noting that, from this point and on, we stop referring to our span as “span”: for almost all cases after this, it is treated as a fully fledged trace, that happens to be composed by a single span.

The UI and the Query

At this point, our trace is in storage, ready to be retrieved by the UI via the query component.

Under the Jaeger UI, traces can be retrieved based on search terms such as the service name, “vertx-create-span” in our case. On the backend side, the Jaeger Query component will see the following as the search terms when we select the service name and click “Find Traces”:

{
  "ServiceName": "vertx-create-span",
  "OperationName": "",
  "Tags": {

  },
  "StartTimeMin": "2018-07-16T17:06:38.426+02:00",
  "StartTimeMax": "2018-07-16T18:06:38.426+02:00",
  "DurationMin": 0,
  "DurationMax": 0,
  "NumTraces": 20
}


The Query’s APIHandler#search method will parse the search terms and pass it over to the storage-specific “Span Reader”. Based on the service name, our trace is then found and added to a result list. The backend sees this list as:

{
  "traceID": "73abf3be4c32c2b8",
  "spans": [
    {
      "traceID": "73abf3be4c32c2b8",
      "spanID": "73abf3be4c32c2b8",
      "flags": 1,
      "operationName": "operation",
      "references": [

      ],
      "startTime": 1.531757144093e+15,
      "duration": 9888,
      "tags": [
        {
          "key": "sampler.type",
          "type": "string",
          "value": "const"
        },
        {
          "key": "sampler.param",
          "type": "bool",
          "value": true
        }
      ],
      "logs": [

      ],
      "processID": "p1",
      "warnings": null
    }
  ],
  "processes": {
    "p1": {
      "serviceName": "vertx-create-span",
      "tags": [
        {
          "key": "hostname",
          "type": "string",
          "value": "caju.kroehling.de"
        },
        {
          "key": "ip",
          "type": "string",
          "value": "192.168.0.3"
        },
        {
          "key": "jaeger.version",
          "type": "string",
          "value": "Java-0.30.3-SNAPSHOT"
        }
      ]
    }
  },
  "warnings": null
}


All messages from the backend to the UI follow a specific format, so, this is what the UI ends up receiving:

{
  "data": [
    {
      "traceID": "73abf3be4c32c2b8",
      "spans": [
        {
          "traceID": "73abf3be4c32c2b8",
          "spanID": "73abf3be4c32c2b8",
          "flags": 1,
          "operationName": "operation",
          "references": [

          ],
          "startTime": 1.531757144093e+15,
          "duration": 9888,
          "tags": [
            {
              "key": "sampler.type",
              "type": "string",
              "value": "const"
            },
            {
              "key": "sampler.param",
              "type": "bool",
              "value": true
            }
          ],
          "logs": [

          ],
          "processID": "p1",
          "warnings": null
        }
      ],
      "processes": {
        "p1": {
          "serviceName": "vertx-create-span",
          "tags": [
            {
              "key": "hostname",
              "type": "string",
              "value": "caju.kroehling.de"
            },
            {
              "key": "ip",
              "type": "string",
              "value": "192.168.0.3"
            },
            {
              "key": "jaeger.version",
              "type": "string",
              "value": "Java-0.30.3-SNAPSHOT"
            }
          ]
        }
      },
      "warnings": null
    }
  ],
  "total": 0,
  "limit": 0,
  "offset": 0,
  "errors": null
}


The Jaeger UI will iterate over the results and nicely render the information on the screen:

In most cases, the UI won’t request the trace information again upon clicking on the trace, but if we open the trace and hit “Refresh,” it causes the UI to do a request that will reachApiHandler#getTrace.It loads the trace based on the given ID from the span storage along with all its spans, responding with a data structure similar to the following:

{
  "spans": [
    {
      "trace_id": "AAAAAAAAAABzq\/O+TDLCuA==",
      "span_id": "c6vzvkwywrg=",
      "operation_name": "operation",
      "references": null,
      "flags": 1,
      "start_time": "2018-07-16T16:05:44.093Z",
      "duration": 9888000,
      "tags": [
        {
          "key": "sampler.type",
          "v_str": "const"
        },
        {
          "key": "sampler.param",
          "v_type": 1,
          "v_bool": true
        }
      ],
      "logs": null,
      "process": {
        "service_name": "vertx-create-span",
        "tags": [
          {
            "key": "hostname",
            "v_str": "caju.kroehling.de"
          },
          {
            "key": "ip",
            "v_str": "192.168.0.3"
          },
          {
            "key": "jaeger.version",
            "v_str": "Java-0.30.3-SNAPSHOT"
          }
        ]
      }
    }
  ],
  "process_map": null
}


Because we have only one trace, with only one span, the payload the UI receives for this request is exactly the same as the one we got from the “search” operation. But the way this data is presented differs:

The detail view of a trace

Afterlife

We’ve covered pretty much all stages of the span’s life, from its genesis up to where it’s finally used to provide insights about the instrumented application. From this point, a span might appear in several afterlife scenarios, like as a data point in Prometheus, or aggregated with other spans in a Grafana dashboard somewhere. Eventually, the storage owner might decide to clean up older data, causing our span to cease to exist and closing the cycle.

application file IO

Published at DZone with permission of Juraci Paixao Kroehling, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Old School or Still Cool? Top Reasons To Choose ETL Over ELT
  • Seamless Integration of Azure Functions With SQL Server: A Developer's Perspective
  • Required Knowledge To Pass AWS Certified Solutions Architect — Professional Exam
  • What Is the Temporal Dead Zone In JavaScript?

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: