O11y Guide: Keeping Your Cloud-Native Observability Options Open
Take look at architecture-level choices being made and share the open standards with the open-source landscape.
Join the DZone community and get the full member experience.Join For Free
This is the fourth article in the series covering my journey into the world of cloud-native observability. If you missed any of the previous articles, head on back to the introduction for a quick update.
After laying out the groundwork for this series in the initial article, I spent some time in the second article sharing who the observability players are. I also discussed the teams that these players are on in this world of cloud-native o11y. For the third article, I looked at the ongoing discussion around monitoring pillars versus phases.
Being a developer from my early days in IT, it's been very interesting to explore the complexities of cloud-native o11y. Monitoring applications goes way beyond just writing and deploying code, especially in the cloud-native world. One thing remains the same: maintaining your organization's architecture always requires both a vigilant outlook and an understanding of available open standards.
In this fourth article, I'm going to look at architecture-level choices being made and share the open standards with the open-source landscape.
As any architect will tell you, open standards are always preferred when considering adding to your existing infrastructure. Does the candidate component under consideration adhere to some defined open standard? Does it at least conform to using open standards?
The Open Choice
When an open standard exists, and in some early cases open consensus where everyone centers around a technology or protocol, it gives an architect peace of mind. You often have choices as to the final component you want to use, as long as it's based on a standard you feel you can swap it out in the future.
An example of one such standard is the Open Container Initiative (OCI) for container tooling in a cloud-native environment. When ensuring your organization's architecture uses such a standard, all components and systems interacting with your containers become replaceable by any future choices you might make as long as they follow the same standard. This creates choice and choice is a good thing!
Open O11y Projects
In cloud-native observability (o11y), there are many open-source projects to help you tackle the initial tasks of o11y. Many are closely associated with the Cloud Native Computing Foundation (CNCF) as projects and promote open standards where possible. Some of them have even become an unofficial open standard by their default mass usage in the o11y domain.
Let's explore a few of the most commonly encountered cloud-native o11y projects.
Prometheus is a graduated project under the CNCF umbrella, which is defined as "...considered stable and used in production." It's listed as a monitoring system and time series database, but the project site itself advertises that it is used to power your metrics and alerting with the leading open-source monitoring solution.
What Does Prometheus Do for You?
It provides a flexible data model that allows you to identify time series data, which is a sequence of data points indexed in time order, by assigning a metric name. Time series are stored in memory and on a local disk in an efficient format. Scaling is done by functional sharing, splitting data across the storage, and federation.
Leveraging the metrics data is done with a very powerful query language called PromQL which we will cover in the next section. Alerts for your systems are set up using this query language and a provided alert manager for notification.
There are multiple modes provided for visualizing the data collected, from a built-in expression browser to integration with Grafana dashboards and a console templating language. There are also many client libraries available to help you easily instrument existing services in your architecture. If you want to import existing third-party data into Prometheus, there are many integrations available for you to leverage.
Each server runs independently, making it an easy starting point and reliable out of the box with only local storage to get started. It's written in the Go language and all binaries are statically linked for easy deployment and performance.
There is a Prometheus organization with all the code bases for their projects.
This is officially a part of the Prometheus project, but well worth mentioning on its own as an unofficial standard used widely to query ingested time series data. As stated in the Prometheus documentation:
Prometheus provides a functional query language called PromQL (Prometheus Query Language) that lets the user select and aggregate time series data in real time. The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus's expression browser, or consumed by external systems via the HTTP API.
There are various ways to learn about how to write queries in PromQL, but a fun little project called PromLens provides an online demo that helps you accelerate your use, understanding, and troubleshooting of PromQL. You can also easily spin up a Docker image with the tool setup for exploration on your own local machine. Visually building queries of your time series data is a big boost to your productivity.
There is a good background story on the origins of PromQL in an interview with the creator Julius Volz.
Another up-and-coming project is found in the incubating section of the CNCF site: it's called OpenTelemetry (OTEL). This is a very fast-growing project with a focus on "high-quality, ubiquitous, and portable telemetry to enable effective observability."
This project helps you to generate telemetry data from your applications and services, then forward that in what is now considered a standard form, called the OTEL Protocol, to a variety of monitoring tools. To generate the telemetry data, you have to first instrument your code, but OTEL makes this very easy with automatic instrumentation through its integration with many existing languages.
You can find the community and its code in the Open-Telemetry organization.
Before OTEL was on the scene, the CNCF project Jaeger provided a distributed tracing platform that targeted the cloud-native microservice industry.
Jaeger is open-source, end-to-end distributed tracing. Monitor and troubleshoot transactions in complex distributed systems.
While this project is fully matured, it's targeted an older protocol and has just recently retired its classic client libraries while advising users to migrate to their native support for the OTEL Protocol standard.
A project that is written in C and Ruby is a graduated CNCF project,
“Fluentd is an open source data collector for unified logging layer. Fluentd allows you to unify data collection and consumption for a better use and understanding of data.”
Under the umbrella of Fluentd you’ll find a new project called Fluent Bit. The documentation says:
“Fluent Bit is an open source and multi-platform log processor tool which aims to be a generic Swiss knife for logs processing and distribution.”
Start Your Observability Engines
This concludes the short overview of the open source projects and (un)official standards that you will encounter when getting started with cloud-native o11y. This brings me to the first step in getting hands-on where we want to start exploring the open source projects, with the understanding that we are starting without issues of having to scale yet.
Published at DZone with permission of Eric D. Schabell. See the original article here.
Opinions expressed by DZone contributors are their own.