The Next Step in Stream Processing: ACID Transactions on Data Streams

DZone 's Guide to

The Next Step in Stream Processing: ACID Transactions on Data Streams

Read on for an overview of what one big data development team is doing to build on the open source Apache Flink big data platform.

· Big Data Zone ·
Free Resource

The stream processing industry has been growing rapidly over the years and it is expected to reach $50 billion in revenue by the end of 2025(1). Since the very early days of stream processing with Apache Flink, we have always held the strong belief that stream processing is a technology that will be the new paradigm in data processing, something that we see coming to fruition as more and more modern enterprises become event-driven, real-time, and software-operated. As a result, stream processing and streaming frameworks have evolved over the years to become more robust and offer increasingly better guarantees for data and computation correctness.

Looking at the history of stream processing we can see three distinguishing steps:

  • The first offering “at least once guarantees.”
  • The second step bringing “exactly once guarantees.”
  • The last one with the introduction of the data Artisans Streaming Ledger, unlocking “ACID guarantees” for data streams.

The Evolution of Stream Processing

Step 1: Distributed Stream Processing for Data Analytics (“At Least Once Guarantees”)

The first distributed stream processors that entered the mainstream targeted analytical applications, and in particular offered a way to analyze data in an imprecise manner, in real-time while the precise analysis was taking place in the background. This was referred to, at that time, as the “lambda architecture” wherein a stream processor offered imprecise results while the data arrived, and a batch processor offered the correct answer in hourly or daily batches. This kind of guarantee is called “at least once processing” in stream processing nomenclature. They are the weakest possible correctness guarantees offered by stream processing systems and were the initial step in the stream processing technology journey.

Step 2: Distributed Stream Processing for Single-Key Applications (“Exactly Once Guarantees”)

Apache Flink pioneered the adoption of true stateful stream processing with exactly once guarantees at scale. This allowed a particular class of applications, both analytical and transactional in nature, to be implemented using stream processing technology with strong correctness guarantees. This fully alleviated the need for the lambda architecture and batch processors for a particular class of applications.

The catch in the above has always been the “particular class of applications”. Today, many stream processors available in the market offer strong consistency guarantees but only for applications that update a single key at a time. This means, for example, that an application that updates the balance of a single bank account can be implemented correctly with stream processing today, but an application that transfers money from one bank account to another is hard to implement with strong consistency guarantees.

Step 3: Distributed Stream Processing for General Applications (“ACID Guarantees”)

With the introduction of Streaming Ledger as part of data Artisans Platform, users of stream processing technology can now build applications that read and update multiple rows and multiple tables with ACID guarantees, the strongest guarantees provided by most (but not all) relational databases.

And Streaming Ledger provides these guarantees while maintaining the full scale-out capabilities of 'exactly once' stream processing, and without affecting the application’s speed, performance, scalability, or availability.

A good (but not 100% accurate) analog in database systems might be to think of 'at least once' guarantees in a Lambda architecture as a form of eventual consistency (eventually the batch system would catch up). 'Exactly once' guarantees, as offered by Flink, are akin to distributed key/value stores that offer consistency for single-key operations, and the guarantees that the Streaming Ledger provides are akin to ACID guarantees that relational databases provide for more generalized transactions that operate on multiple keys and tables.

We believe that this is the next evolutionary step in stream processing that opens the door for a much wider set of applications to be implemented in a correct, scalable, and flexible manner using the streaming architecture.

Going Beyond the 'Exactly Once' Semantics of Stream Processing Frameworks

data Artisans Streaming Ledger builds on Apache Flink and provides the ability to perform serializable transactions from multiple streams across shared tables and multiple rows of each table. This can be easily considered as the data streaming equivalent of multi-row transactions on a key/value store or even across multiple key/value stores. Streaming Ledger uses Flink’s state to store tables so that no additional storage or system configuration is necessary. The building blocks of Streaming Ledger applications consist of tables, transaction event streams, transaction functions, and optional result streams.

For more information download the data Artisans Streaming Ledger whitepaper.

data Artisans Streaming Ledger opens the path to stream processing for a completely new class of applications that previously relied on relational database management systems. Data-intensive, real-time applications such as fraud detection, machine learning, and real-time trade pricing can now be upgraded to the streaming era effortlessly.

(1) Source: Global Streaming Analytics Market Report 2018, Market Insights Report, December 2017

stream processing ,apache flink ,data processing ,big data

Published at DZone with permission of Igal Shilman . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}