DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • The Human Side of Logs: What Unstructured Data Is Trying to Tell You
  • Top Book Picks for Site Reliability Engineers
  • Overview of Telemetry for Kubernetes Clusters: Enhancing Observability and Monitoring
  • Shift-Right Testing: Smart Automation Through AI and Observability

Trending

  • Developers Beware: Slopsquatting and Vibe Coding Can Increase Risk of AI-Powered Attacks
  • My LLM Journey as a Software Engineer Exploring a New Domain
  • Scalable, Resilient Data Orchestration: The Power of Intelligent Systems
  • Medallion Architecture: Efficient Batch and Stream Processing Data Pipelines With Azure Databricks and Delta Lake
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. Monitoring and Observability
  4. Building an Open-Source Observability Toolchain

Building an Open-Source Observability Toolchain

Read this article to explore the benefits of building an open-source toolchain for the observability of distributed systems and more.

By 
Sudip Sengupta user avatar
Sudip Sengupta
DZone Core CORE ·
Nov. 19, 22 · Analysis
Likes (1)
Comment
Save
Tweet
Share
9.0K Views

Join the DZone community and get the full member experience.

Join For Free

This is an article from DZone's 2022 Performance and Site Reliability Trend Report.

For more:


Read the Report

Open-source software (OSS) has had a profound impact on modern application delivery. It has transformed how we think about collaboration, lowered the cost to maintain IT stacks, and spurred the creation of some of the most popular software applications and platforms used today. 

The observability landscape is no different. In fact, one could argue that open-source observability tools have been even more transformative within the world of monitoring and debugging distributed systems. By making powerful tools available to everyone — and allowing anyone to contribute to their core construct — open-source observability tools allow organizations of all sizes to benefit from their powerful capabilities in detecting error-prone patterns and offering insights of a framework's internal state. 

In this article, we will discuss the benefits of building an open-source toolchain for the observability of distributed systems, strategies to build an open-source observability framework, best practices in administering comprehensive observability, and popular open-source tools.

Embracing an Open-Source Observability Framework

In spite of their individual benefits, observability tools have limited scope and are mostly focused to monitor only one of the key pillars of observability. Adopting multiple tools also discourages the concept of a single source of truth for comprehensive observability. As an alternative to using individual tools for observing indicators, an observability platform helps with contextual analysis by enriching the data collected by monitoring, logging, and tracing tools. A single observability platform also spans the full scope of an organization's distributed systems to give you a comprehensive view of a cluster state. 

An observability framework in distributed systems

Figure 1: An observability framework in distributed systems

Observability frameworks are typically categorized into: 

  • Centralized frameworks – These are typically designed for large enterprises that consume a lot of resources and need to monitor numerous distributed systems at once. As such frameworks are supported by a lot of hardware and software, they are expensive to set up and maintain.
  • Decentralized frameworks – These frameworks are preferred for use cases that do not immediately require as much equipment or training and that do require a lower up-front investment towards software licenses. As decentralized frameworks aid collaboration and allow enterprises to customize source code to meet specific needs, these are considered to be one of the popular choices when building an entire tech stack from scratch.

Benefits of Using Open-Source Observability Tools

Observability in itself is built around the open-source concept that relies on the decentralized access of key indicators for collaborative action and performance enhancement. Building an open-source framework for observability relies on fewer dependencies than centralized, proprietary software solutions. 

Many organizations use OSS because it’s free and easy to use, but there’s more to it than that. Open-source tools also offer several advantages over proprietary solutions to monitor how your applications are performing. Beyond monitoring application health, open-source observability tools enable developers to retrofit the system for ease of use, availability, and security. Using OSS tools for observability offers numerous other benefits, including: 

  • Easily extensible for seamless integration with most modern stacks
  • Being vendor-agnostic, which helps observe multi- or hybrid-cloud setups
  • Easy customization to support various use cases
  • Enhanced visibility and alerting by factoring custom anomalies
  • Accelerated development workflows by using pre-built plugins and code modules
  • Low operating investment by saving on license costs
  • Community contributions for enhancements and support

Strategies To Build an Open-Source Observability Framework

In order to get the most out of an open-source observability framework, it is important to embrace the principles of openness and collaboration. For comprehensive observability, it is also important to factor in crucial considerations when building an observability framework with open-source tools. 

Some recommended strategies to build an open-source observability framework include proactive anomaly detection, time-based event correlation, shift-left for security, and adopting the right tools. 

Proactive Anomaly Detection

An optimally designed observability framework helps predict the onset of potential anomalies without being caught off-guard. It is important to be able to identify the root cause and fix the problem before it impacts the cluster performance or availability. 

A distributed system's observability strategy should be built upon the four golden signals: latency, saturation, errors, and traffic. These signals are key representations of the core aspects of a cluster state’s services that collectively offer a contextual summary of its functioning and performance issues. 

Although the high-level information produced by these signals might not be granular on its own, when combined with other data of the key pillars, such as event logs, metrics, or traces, it's easier to pinpoint the source of a problem (see Figure 2). 

Key pillars of observability

Figure 2: Key pillars of observability

Time-Based Event Correlation

Event logs offer rich insights to identify anomalies within distributed systems. Use open-source tools that help capture occurrences, such as when an application process was executed successfully or a major system failure occurred. Contextual analysis of such occurrences helps developers quickly identify faulty components or interactions between endpoints that need attention. 

Logs should also combine timestamps and sequential records of all cluster events. This is important because time-series data helps correlate events by pinpointing when something occurred, as well as the specific events preceding the incident. 

Shift Left for Security

Open-source tools are often considered vulnerable to common attack patterns. As a recommended strategy, open-source tools should be vetted for inherent flaws and potential configuration conflicts they may introduce to an existing stack. The tools should also support the building of an observability framework that complements a shift-left approach for security, which eliminates the need for reactive debugging of security flaws in production environments. 

Beyond identifying the root cause of issues, the toolchain should enrich endpoint-level event data through continuous collection and aggregation of performance metrics. This data offers actionable insights to make distributed systems self-healing, thereby eliminating manual overheads to detect and mitigate security and performance flaws. 

Adopting the Right Tools

Observing distributed systems extensively relies on a log and metrics store, a query engine, and a visualization platform. There are different observability platforms that focus on measuring these indicators individually. Though they work independently, several of them work together extremely well, creating comprehensive observability setups tailored to an organization’s business objectives. 

Along with considerations for the observability components, consider what it means to observe a system by factoring in scalability. For instance, observing a multi-cloud, geographically distributed setup would require niche platforms when compared to monitoring a monolithic, single-cluster workload. 

Administering Observability for Performance and Site Reliability

By providing in-depth insights of software processes and resources, observability allows site reliability engineers (SREs) to assure optimum performance and health of an application. However, the challenges of observing the state and behavior of a distributed system are often more complex than assumed. While it is important to inspect the key indicators, it is equally important to adopt the right practices and efficient tools that support observability to collectively identify what is happening within a system, including its state, behavior, and interactions with other components. 

Some recommended best practices to enable effective observability in distributed systems include: 

  • Enforce the use of service-level agreements (SLAs) in defining performance indicators
  • Use deployment markers for distributed tracing
  • Set up alerts only for critical events
  • Centralize and aggregate observability data for context analysis
  • Implement dynamic sampling for optimum resource usage and efficient pattern sampling

Popular Open-Source Tools for Observability of Distributed Systems

There are several open-source observability tools that can provide insight into system performance, identify and diagnose problems, and help you plan capacity upgrades. While each tool comes with its own strengths and weaknesses, there are a few that stand out above the rest. It is also a common approach to use them together to solve different complexities. The table below outlines some popular open-source observability tools, their core features, benefits, and drawbacks to help understand how they differ from each other. 

Tools Best Known For For Benefits Drawbacks
LogStash Log collection and aggregation
  • Native ELK stack integration for comprehensive observability
  • Offers filter plugins for the correlation, measurement, and simulation of events in real-time
  • Supports multiple input types
  • Persistent queues continue to collect data when nodes fail
Lacks content routing capabilities
Fluentd Collection, processing, and exporting of logs
  • Supports both active-active and active-passive node configurations for availability and scalability
  • Inbuilt tagging and dynamic routing capabilities
  • Offers numerous plugins to support data ingestion from multiple sources
  • Supports seamless export of processed logs to different third-party solutions
  • Easy to install and use
Adds an intermediate layer between log sources and destinations, eventually slowing down the observability pipeline
Prometheus with Grafana Monitoring and alerting
  • PromQL query language offers flexibility and scalability in fetching and analyzing metric data
  • Combines high-end metric collection and feature-rich visualizations 
  • Deep integration with cloud-native projects enables holistic observability of distributed DevOps workflows  
Lacks long-term metric data storage for historical and contextual analysis
OpenTelemetry Observability instrumentation
  • Requires no performance overhead for generation and management of observability data
  • Enables developers to switch to new back-end analysis tools by using relevant exporters
  • Requires minimal changes to the codebase
  • Efficient utilization of agents and libraries for auto-instrumentation of programming languages and frameworks
Does not provide a visualization layer

Summary

Observability is a multi-faceted undertaking that involves distributed, cross-functional teams to own different responsibilities before they can trust the information presented through key indicators. Despite the challenges, observability is essential for understanding the behavior of distributed systems. With the right open-source tools and practices, organizations can build an open-source observability framework that ensures systems are fault-tolerant, secure, and compliant. Open-source tools help design a comprehensive platform that is flexible and customizable to an organization’s business objectives while benefiting from the collective knowledge of the community.

This is an article from DZone's 2022 Performance and Site Reliability Trend Report.

For more:


Read the Report

Observability

Opinions expressed by DZone contributors are their own.

Related

  • The Human Side of Logs: What Unstructured Data Is Trying to Tell You
  • Top Book Picks for Site Reliability Engineers
  • Overview of Telemetry for Kubernetes Clusters: Enhancing Observability and Monitoring
  • Shift-Right Testing: Smart Automation Through AI and Observability

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!