DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • 4 Key Observability Metrics for Distributed Applications
  • The Principles of Planning and Implementing Microservices
  • A Deep Dive Into Distributed Tracing
  • C# Applications Vulnerability Cheatsheet

Trending

  • AI, ML, and Data Science: Shaping the Future of Automation
  • Evolution of Cloud Services for MCP/A2A Protocols in AI Agents
  • A Complete Guide to Modern AI Developer Tools
  • How the Go Runtime Preempts Goroutines for Efficient Concurrency
  1. DZone
  2. Data Engineering
  3. Data
  4. Distributed Tracing With Zipkin and ELK

Distributed Tracing With Zipkin and ELK

Got latency issues? Learn how to hook Zipkin up to the ELK stack for distributed tracing for improved performance in this tutorial.

By 
Daniel Berman user avatar
Daniel Berman
·
May. 22, 18 · Tutorial
Likes (7)
Comment
Save
Tweet
Share
29.3K Views

Join the DZone community and get the full member experience.

Join For Free

While logs can tell us whether a specific request failed to execute or not and metrics can help us monitor how many times this request failed and how long the failed request took, traces help us debug the reason why the request failed or took so long to execute by breaking up the execution flow and dissecting it into smaller events. 

In a microservice architecture, tracing is extremely challenging because requests will span multiple services, each executing one or more processes, across multiple servers. Distributed tracing is a tracing methodology that seeks to overcome this challenge by instrumenting applications in specific junctions in the request’s path and reconstructing its lifecycle.  

Zipkin, based on Google Dapper and initially developed by Twitter, is a Java-based application that is used for distributed tracing and identifying latency issues. Unique identifiers are attached to each request which are then passed downstream through the different waypoints, or services. Zipkin then collects this data and allows users to analyze it in a UI.

So why hook up Zipkin with the ELK Stack?  

The Zipkin UI provides us with some basic options to analyze traced requests — we can use a dependency diagram to view the execution flow, we can filter or sort the traces collected by Zipkin per application, length of trace and timestamp. 

Using ELK, though, takes trace analysis up a notch. Elasticsearch can be used for long-term retention of the trace data and Kibana will allow you gain much deeper insight into the data. This article provides you with the steps to integrate the two tools.

Installing Zipkin

The easiest way to install Zipkin is using Docker:

docker run -d -p 9411:9411 openzipkin/zipkin

If you want to customize your Zipkin configuration though, it might be easier to pass some variables in the run command. In our case, I want to pass some environment variables and specify Elasticsearch as the storage type (this example assumes a locally running ELK Stack), and so I will use the following commands to download and run Zipkin: 

curl -sSL https://zipkin.io/quickstart.sh | bash -s

java -DSTORAGE_TYPE=elasticsearch -DES_HOSTS=http://127.0.0.1:9200 -jar zipkin.jar

If all goes as expected, you should see this output in your terminal: 

2018-05-02 09:59:27.897  INFO [/] 12638 --- [  XNIO-2 task-1] 
o.s.web.servlet.DispatcherServlet        : FrameworkServlet 
'dispatcherServlet': initialization completed in 108 ms

Open Zipkin at http://localhost:9411:

Of course, we have no traces to analyze in Zipkin yet, so our next step is to simulate some requests.

Setting Up Demo Services

Zipkin provides a number of built-in instrumentations for Java, Go, Ruby, and #C apps, as well as a decent list of community instrumentations, for collecting tracing information and sending it to Zipkin.

For the sake of demonstrating how to hook up Zipkin with the ELK Stack, I’m going to be using an instrumentation library for Java applications called Brave. Specifically, I will be setting up two Java servlet services that communicate via HTTP. Brave supports a whole lot more but this example will suffice for our purposes (you will need Java JDK 1.8 to run the services in your environment). 

First, download the source code:

git clone https://github.com/openzipkin/brave-webmvc-example.git

In this example, there are two services that send tracing info to Zipkin — a backend service and a frontend service. Your next step is to start these services.

The frontend service:

cd /brave-webmvc-example/webmvc4
mvn jetty:run -Pfrontend

The backend service:

cd /brave-webmvc-example/webmvc4
mvn jetty:run -Pbackend

Both services should report the same success message in the run output:

[INFO] Started Jetty Server

It’s time to send out some requests. The frontend service can be accessed via http://localhost:8081 and calls the backend service and displays a timestamp. 

Opening Zipkin again, you should be able to see traces for called requests:

Because we defined Elasticsearch as the storage type when running Zipkin, the trace data should be indexed in Elasticsearch. To verify, run:

curl -XGET 'localhost:9200/_cat/indices?v&pretty'

You should see a ‘zipkin*’ index created:

health status index                  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   zipkin:span-2018-05-02 bpLwf0W1TVysyQ5gHpSkDw   5   1          4            0     14.7kb         14.7kb

Our next step is to open Kibana and define the new index pattern. Open Kibana at http://localhost:5601, and commence with defining a ‘zipkin*’ index pattern:

As the timestamp field, use the ‘timestamp_millis’ field:

After creating the index pattern, open the Discover tab and you will be able to see your trace information collected by Zipkin.

As you may notice, some of the fields included in the trace data are not mapped correctly. You could make do with the existing mapping but if you want to analyze and visualize all available data properly, you will need to do some Elasticsearch mapping adjustments.

For example, to map the ‘remoteEndpoint.port’ field, you would use the following mapping API:

curl -X PUT "http:/localhost:9200/zipkin:span-2018-05-02/_mappings/span" -H 'Content-Type: application/json' -d'

{
 "properties": {
  "remoteEndpoint": {
   "dynamic": "false",
   "properties": {
    "port": {
     "type": "string"
    }
   }
  }
 }
}

Once you have updated field mapping, refresh the mapping in Kibana to see your trace fields mapped correctly (under management → Index Patterns).

Analyzing Trace Data in Kibana

Kibana allows you to query the data using various different query types, but things start to get much more interesting when visualizing the data. Here are a few examples.

Average Service Duration

Using an average aggregation of the "duration" field, you can create some metric visualizations that give you the average time for your services. Use a query to build the visualization on a specific service. In our example:

localEndpoint.serviceName:"frontEnd"

Average Duration per Service Over Time

It’s also interesting to see the trend over time for service duration. Kibana line chart visualizations are perfect for this.

Trace List

You can create a data table visualization that gives you a nice breakdown of all the traces collected from your application by Zipkin and stored in Elasticsearch. 

To do this you would need to use a sum metric aggregation of the "duration" field and a number of basic bucket aggregations as depicted below. 

You can then add these up into one dashboard that provides a nice overview of your trace data.

Endnotes

Logs, metrics, and traces are the three underlying tenets of what is being coined today in the industry as observability — instrumenting our environment to provide actionable insight through meaningful data. Of course, applications using microservices make implementing observability a whole lot harder and it’s no surprise that more and more tools are being introduced to help make it easier.   

The ability to sample requests, Zipkin’s instrumentation libraries, as well as its native support for Elasticsearch storage are the main reasons that we at Logz.io use Zipkin to discover latency issues with our services. In a future post, we will compare Zipkin to Jaeger, another tool in this space. 

microservice Data (computing) Requests application

Published at DZone with permission of Daniel Berman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • 4 Key Observability Metrics for Distributed Applications
  • The Principles of Planning and Implementing Microservices
  • A Deep Dive Into Distributed Tracing
  • C# Applications Vulnerability Cheatsheet

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!