What Is Cloud Native Observability V. Visibility and Why Is It So Important?
Why the shift to cloud-native observability has arrived, and a meaningful definition of what it is, the resulting significance, and implications.
Join the DZone community and get the full member experience.Join For Free
Cloud-native technologies are revolutionary and disruptive to existing infrastructure, application, and monitoring methods that result in opacity problems. We’ll go into more detail as to why the shift to cloud-native observability has arrived, propose a broader, more meaningful definition of what it is, the resulting significance, and deeper implications. But first, let’s revisit containerization, and by extension, why cloud-native observability is so important.
Virtualization and Cloud-Native/Container-Based Systems
There is no intervening hypervisor between the OS and the underlying infrastructure in containerized environments, enabling applications and developers to take full advantage of the benefits of cloud-native development and deployment in terms of interoperability, efficiency, and performance. Consequently, in addition to the resulting efficiency and portability advantages, containerization makes it possible to access and use operating system (and kernel) services directly to examine the state and operation of the underlying infrastructure (nodes/bare metal servers), as well as the attendant services, the operating system, and applications. This is profoundly important because in the cloud-native world – network topology (physical and virtual) is hidden, interfaces (network namespaces) are hidden, data flows are hidden, and to make matters even more challenging - resources can be ephemeral - dynamically configured, provisioned, deployed and reused.
Stated more clearly; cloud-native, serverless, compute environments while representing the most scalable, open, and performant way to exploit cloud infrastructure have the side effect of obfuscating or making it very difficult or impossible to observe, instrument, and monitor the systems with legacy tools and technologies that were previously relied on providing observability metrics (packet brokers, vTAPs, SPAN/Mirror ports, network flow statistics, logs, and traditional metrics).
The Significance of Real-time, Cloud Native Observability
The same cloud-native technologies that provide all the benefits for development and deployment can also be tasked to provide the solution. A newer, better solution could be developed and deployed as a set of containerized microservices-based agents that can be deployed in the cloud-native infrastructure to harness the same benefits of the cloud environment: operating as event-driven in-node processes providing very detailed observability and processing functionality at the kernel and network namespace, to exploit the system and kernel-level instrumentation uniquely, to access, collect and process traffic as well as produce detailed telemetry — where the actual data is generated - in such a way as to provide programmatic visibility and access to the underlying infrastructure enabling better, newer forms of observability and control.
Having access to the kernel for instrumentation provides a reliable and immutable data source for security and network or application performance management (NPM/APM). This new form of observability and control should be event-driven, programmatic, and manifested using deep system-level instrumentation and an interactive command and control communications architecture via open, standards-based interfaces and a distributed message bus architecture. This would allow organizations to easily instrument a network with a lightweight observability fabric that examines the network communications as they happen and continuously publishes the necessary metadata for follow-on analytic, visualization, and management applications.
Have You Considered the Containerized Visibility Fabric Approach?
One approach we have invested in is a containerized visibility fabric (CVF). Cloud-native observability with CVF provides access to the systems, services, and instrumentation hooks down to the node and kernel-level, generating the resulting telemetry used to correlate events across the physical/virtual cloud infrastructure monitoring. The main component is the CVF agent: it introspects, processes, and packages all the network communications characteristics (packets, specific protocols, metrics) with enriched metadata.
This network telemetry is published continuously and in real-time using an open, distributed message bus architecture where external analytics and third-party tools and/or AI/ML workflows can analyze the resulting streaming telemetry and make command-control decisions and responses back to the source, providing interactive observability and control of the infrastructure. The agent’s design is event-driven, lightweight, and performant as possible not to degrade the network's performance; it is tasked with observing. The resulting telemetry contains advanced indexing data-structures (hashes) to support correlation with adjacent sensors' output. The agents' fleet provides a consistent and unified fabric view with a minimal performance footprint that easily scales to leverage the resources within a cloud-native environment.
This approach provides a more reliable and scalable, cloud-native way to dynamically inspect, extract and process detailed telemetry from the cloud-native infrastructure to the kernel and data link layer, and serve it up continuously and in real-time via serialized metadata using a high-performance streaming messaging system (NATS, Kafka…). This enables deep visibility, flexibility, and investment protection by the openness of the architecture and support for third-party integrations. Whereas legacy traffic acquisition and monitoring tools give you an interpretation of what is going on with your network based on activity, the containerized visibility approach gives you observability with detailed real-time telemetry served up right from the source.
Why Is Observability so Important?
There has been a lot written about Observability recently. According to Wikipedia, “In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.” “Simply put, observability is achieved when data is made available from within the system that you wish to monitor. Monitoring is the actual task of collecting and displaying this data.”
Moreover, observability's current working definition focuses exclusively on applications that use metrics, traces, and logs for application development and troubleshooting. This narrow definition does not explicitly address the virtual or physical network infrastructure's role and the resulting importance of network resource interaction with the system's overall performance, security, and reliability.
Consequently, extending the control theory tenet of observability to cloud-based systems must include the introspection of the states and behaviors of the networks, interfaces, and namespace. Therefore, to achieve the goal of observability for distributed cloud-based operating environments requires new forms of richer, deeper, more useful knowledge and context combined with the ability to correlate events across the entire system; essentially inclusive of the entirety of the virtual or physical networks, to truly observe and understand what is happening within a system.
So, observability requires that both the quality and utility of data (telemetry) obtained from the systems are sufficient to understand the system under observation, not merely constrained by what information is available. Conventional monitoring and visibility solutions have traditionally been a good source of information; they only provide static snapshots (data structures logs, PCAP files, traces…) obtained from pre-defined, available sources or captured from monitoring applications, probes, or network traffic. Furthermore, with these legacy approaches, modifying the format or contents of those existing, predefined forms of telemetry could entail rewriting the entire application. In some cases (specifically with a packet broker), it requires entirely new hardware.
|Visibility / Monitoring||Cloud-native observability|
|What information is available to understand the system?||What information do I need to understand the system?|
|Use available metrics, traces, and logs||Purposefully instrument and introspect|
|Data-at-rest||Streaming, continuously and in real-time|
|Static||Programmatic / programmable and event-driven|
The Deeper Implications of Cloud Native Observability
As described above, a containerized visibility approach takes cloud-native observability further. In cloud-native, serverless environments, the observational data needs to be continuous and adaptable as the systems evolve. The CVF agents are sensor agent programs that are event driven and operate in secure confines of protected kernel space, meaning they are secure, resource-efficient, and have access to functions and services at the operating system (kernel) level and directly process the resulting telemetry up to the application layer.
Additionally, and because CVF agents are event-driven to collect and process information when and where it is generated and loaded and launched on-demand, CFV agent capabilities are more efficient, evoked when and where they are needed. The CVF is not just a sensor; However, it can serve as a simple probe, capturing, filtering, and replicating traffic, as a; asm of a cloud-native TAP, it is more an observability tool utility, scalability, and flexibility.
CVF agents make data correlation in cloud-native environments much, much simpler. The ability to generate unique time-series telemetry when and where it is needed as well as with the granularity to support event correlation down to the process, container, machine/node, flow, or link/interface that can be combined with other data sources is an essential and distinctive feature when it comes to scalability. This capability not only supports more powerful system-wide analytic capabilities but also supports the ability to work with other forms of observability tools, telemetry, AI/ML workflows, or semi-structured data within your overall operations analysis and monitoring practice.
Transitioning to the Future: Interoperability and Investment Protection.
Outside of the benefits of cloud-native observability, a significant proportion of applications are not greenfield. They are hybrid environments that use a combination of stand-alone and virtualized infrastructure and require the capabilities and type of telemetry a CVF can deliver. While pure-play cloud-native environments are growing, many organizations have significant investments in analytics and monitoring tools which can only utilize the more conventional data-at-rest sources (logs, files, traces, NetFlow, IP-FIX, PCAP, DPI…).
To support those environments, the CVF provides open standards-based compatibility; with the ability to stream both metadata and legacy formats (PCAP, DPI files, NetFlow, IP-FIX, …) into static file and data formats as well as integrate with third-party, existing tools and applications (e.g., WireShark, etc.) that can benefit from the richness, accuracy, and immutability of CVF produced telemetry. Also, there are many applications for supporting regulatory compliance, legal, forensic auditing, etc., that, by definition, require the resulting traffic capture and telemetry to be collected and secured for later retrieval and analysis. The CVF supports open data formats and interfaces that can be configured to stream the captured traffic and telemetry into data repositories, static storage, and data lakes to meet those requirements better.
The growth in cloud-native development and deployment increasingly provides services that organizations rely on and can also provide the foundation for the next generation of observability tools. As we’ve laid out here, there is a myriad of resource efficiencies and operational benefits gained by implementing a containerized visibility fabric approach to observability that provides performant, scalable, and real-time observability metrics for APM, NPM, and cybersecurity. When adopting cloud-native, why should your observability tools be stuck in a non-cloud world?
Published at DZone with permission of Peter Dougherty. See the original article here.
Opinions expressed by DZone contributors are their own.