DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Why and When to Use GraphQL
  • How to Practically Implement Microservices Infrastructure in Your Business
  • Modeling Saga as a State Machine
  • Curating Efficient Distributed Application Runtime (Dapr) Workflows

Trending

  • Monolith: The Good, The Bad and The Ugly
  • AI-Driven Test Automation Techniques for Multimodal Systems
  • Endpoint Security Controls: Designing a Secure Endpoint Architecture, Part 1
  • Next Evolution in Integration: Architecting With Intent Using Model Context Protocol
  1. DZone
  2. Data Engineering
  3. Data
  4. What Is Distributed Tracing?

What Is Distributed Tracing?

Distributed tracing is an observability data source designed to trace a transaction across a distributed microservices environment that tells you exactly where a problem is happening. Learn more.

By 
April Yep user avatar
April Yep
·
Nov. 22, 22 · Opinion
Likes (2)
Comment
Save
Tweet
Share
4.9K Views

Join the DZone community and get the full member experience.

Join For Free

Cloud-native development has revolutionized application development in ways that are both positive and challenging. The adoption of microservices architectures on container-based infrastructures enables faster software development lifecycles. At the same time, problems can strike when changes are made to apps, such as adding new features. Moreover, app updates can happen multiple times a day. So how do teams track down problems when error messages pop up, or when it suddenly takes longer to load an application?

Unlike the monolithic approach to application development, where a straightforward application call makes it easy to find where a problem exists, cloud-native applications and the container-based infrastructure they run on are ephemeral. This means problems are elusive. The need for distributed tracing, which tells you exactly where a problem is happening, becomes acutely important for teams needing to quickly fix their applications.

What Does Distributed Tracing Do?

Distributed tracing makes it possible to see where things are happening. Additionally, distributed tracing captures individual units of work, also known as spans, in a distributed system. A great example of distributed tracing is a workflow request, which is a series of activities that are necessary to complete a task. We actually see workflow requests in everyday activities…like ordering our favorite cupcakes online. In the example below, you’ll see how this works:

Let’s say Nichelle and Robin each want to know if red velvet cupcakes are in stock at their local bakery. Nichelle and Robin would get on their respective mobile phones, open the bakery application, and search for “red velvet.” 

  • When Nichelle and Robin initiate their searches for red velvet cupcakes, each triggers a workflow request to get information about inventory 
  • These workflow requests are then processed through application services
  • Information is returned to their respective mobile apps 

Keep in mind that each workflow request for Nichelle and Robin were the same — they each had to go through their applications and use the same services and asked for the same type of cupcake. However, the metadata associated with each of them — like tags, performance, or descriptors — may be different. While workflow requests may be the same for multiple users, the associated metadata is unique.

Seeing trace metadata is helpful for engineers investigating issues because it allows them to identify patterns, anomalies, or outliers and helps identify where issues lie in a stack. 

How Did Distributed Tracing Come To Be? 

In a monolithic world, workflow requests were easy to follow despite the application components being more complex — this made it easier to find where a problem is happening. However, in today’s cloud-native world of microservices, things are reversed. Application components are simpler, but the request workflows are more complex. 

As this shift in complexity has continued, the need to understand where problems are happening has become harder to discern. Going back to our bakery metaphor: If Nichelle and Robin want to find out how many red velvet cupcakes are in stock at their local bakery, the workflow requests are each different between a monolithic vs a microservices setup. 

In a monolithic world, this would have been a simple workflow to one application that would then do several calls within that service to collect this data. But if we are in a microservices environment, and repeat that same action of requesting inventory of our cupcake store, the UI fires off a notice to multiple microservices simultaneously, and simultaneously receives this data back from each microservice. 

While each workflow request may show there is a problem, if one exists, it’s less likely to be intuitive where the problem is with the microservices environment. 

Early Distributed Tracing Tools Were Hard To Use 

The market responded to these architectural changes by building new specialized tracing tools, but these tools are rarely used. Why? The early wave of tracing tools were:

  • Hard to use
  • Adopted by technically advanced users — users who typically have a deep understanding of the architecture and tool
  • Didn’t provide the level of detail needed to easily discover where a problem exists 

For example, sampling — which allows you to make a decision on what data to show — is the right tool for some, but it's not right for others who need to store less detail. The bottom line is being able to set the intervals of sampling should be left up to the user and not the distributed tracing tool itself. 

Distributed Tracing Needs To Be Easy for Novice and Expert Users Alike 

Let’s go back to our bakery example of Nichelle and Robin, who wanted to find out the inventory status of red velvet cupcakes. If there’s a problem searching inventory, engineers will likely get an error message and an admin would get an alert via their metric data that there is a problem. If we had done this same request workflow in a monolithic manner—by instrumenting (instructing) each unit of work to send telemetry data back to our observability tool — it would take up valuable time, resources, and costs. And if the transaction has already occurred by the time we have identified there is a problem, it becomes much harder to narrow down where the problem occurred. 

Add in the growing complexity of collaboration as teams within organizations get bigger, the entire process seems to be more black-box oriented rather than intuitive. Ensuring the right expert is assigned to a problem from the start is crucial for your business’s success and ensures a seamless experience for customers. 

How Does Distributed Tracing Benefit My Organization?

Distributed tracing takes a two-pronged approach to benefiting your organization. 

  • The first is that metrics are an early warning system to let teams know there is a problem. This also makes it easier for novice members of teams to more quickly understand what’s going on rather than having to call in a power user when something goes wrong. 
  • Secondly, distributed tracing provides insights into services that allow development teams to know and understand things like poor system health or identify where bottlenecks are in the software stack. This all happens in conjunction with the early warning capability described above. Teams are provided with the data they need to restore services, deliver a positive end-user experience, and adhere to the organization’s service-level agreements.

Going back to my earlier example of Nichelle and Robin wanting to know how many red velvet cupcakes are in stock: With distributed tracing, teams are able to find out where in this request workflow there may be a problem. 

Why Do I Need Distributed Tracing?

Tracking, that’s why. Without tracking we can’t tell where or when something happened in workflows. 

But how does this apply to development teams? Here are some of the key reasons why implementing distributed tracing into your organization can aid you and, in turn, your customers. Distributed tracing:

  • Informs development teams about the health and status of deployed application systems and microservices
  • Identifies irregular behavior that results from scaling automation
  • Reviews how the average response times, error frequencies, and other metrics are reflected through the end-user’s experience
  • Tracks and records vital statistics on performance with user-friendly dashboards
  • Debugs and isolates bottlenecks within the system while addressing performance-level issues at the code level
  • Recognizes and addresses the base cause of unexpected issues
End user Engineer Software development Data (computing) Inventory (library) microservice mobile app Requests teams workflow

Published at DZone with permission of April Yep. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Why and When to Use GraphQL
  • How to Practically Implement Microservices Infrastructure in Your Business
  • Modeling Saga as a State Machine
  • Curating Efficient Distributed Application Runtime (Dapr) Workflows

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!