The Streaming Plane
This concept addresses the need for real-time data integration and processing, enabling organizations to harness the power of both planes simultaneously.
Join the DZone community and get the full member experience.Join For Free
Zhamack Dehghani nicely described the architectural data planes. In the dynamic landscape of data management, the concept of the "data divide" has emerged as a pivotal idea that highlights the crucial distinction between two essential components: the operational data plane and the analytical data plane. This concept is particularly relevant in today's data-driven world, where organizations strive to extract maximum value from their data assets. Understanding the data divide between these two planes is fundamental for devising effective strategies to manage, process, and derive insights from data.
Introduction to the Streaming Plane: Bridging the Operational and Analytical Data Planes
This bridge has traditionally been a one-way highway from the operational to the analytical plane. The path in the opposite direction is an arduous, awkward, and costly one that includes solutions named: Reverse ETL (rETL) and Data Activation. These solutions try to extract already cleansed and mastered data residing in the analytical plane from the data systems that aren’t optimized for large extraction.
In the ever-evolving landscape of data management, a prominent dimension acts as a multi-directional bridge between the operational and analytical data planes: the streaming plane. This concept addresses the need for real-time data integration and processing, enabling organizations to harness the power of both planes simultaneously. The streaming plane plays a pivotal role in achieving a holistic and dynamic data ecosystem, allowing businesses to respond to events in real time while also gaining valuable insights from historical data.
Streaming and real-time systems are positioning themselves by enabling access to both real-time and historical data without the limitations that the other data planes have.
Originally this plane just did ETL where the destinations were data lakes, data warehouses, and lake houses. As the systems in the streaming plane start to hold tables and materialized views, clients from external applications and internal dashboards can access data from each other.
Systems that live in the streaming plane tend to have these characteristics:
They all must source from a streaming platform like Kafka, Redpanda, Pulsar, Kinesis, etc.
They can build materialized views.
They can source historical data from an object store, data warehouse, data lake, or lake house.
They can transform data.
They can serve data either synchronously or asynchronously.
These streaming plane systems like to focus on a subset of these characteristics which creates a cloud of streaming systems spanning from operational to analytical.
These streaming and real-time systems can create tables or materialized views of data from either side of the streaming plane. The tabular structures can range from row-based, column-based, or embedded stores depending on the use case.
Consumers of data in the streaming plane have 2 ways of consuming data — synchronously from the API Gateway and asynchronously from topics in a streaming platform. In fact, we are also seeing more transactional systems migrate into the fold with databases with columnar formats bringing more analytics to the edge. We will need to wait to see how these newer systems integrate into streaming and real-time analytics.
The streaming plane is assuming the role of Data-as-a-Service (DaaS) or a real-time streaming data mesh where the data catalog is the system holding the tabular structures that expose them synchronously or asynchronously.
The streaming plane is where streaming and batching converge to bring real-time analytics with historical context to all data planes. It’s built on the foundation of streaming platforms, stream processors, and connectors that allow for multi-region, multi-cloud, multi-domain, and hybrid architectures.
The tabular structures in the streaming plane expose both synchronous and asynchronous endpoints and provide enough metadata to show where data originates and how they are processed. More importantly, metadata that describes consumability, to name a few:
Throughput: Can this data product handle another consumer?
SLAs: What guarantees does this data product provide?
Most likely you have some of the components that build a streaming plane. You only need a system that exposes the tabular structures and can serve them at scale.
Make sure your security is set up so that this authorization and authentication can be used for all of the data product endpoints.
Data Mesh implemented as a streaming plane is achievable. You can read more in this book.
Published at DZone with permission of hubert dulay. See the original article here.
Opinions expressed by DZone contributors are their own.