DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Introduction to Apache Kafka With Spring
  • Harnessing Real-Time Insights With Streaming SQL on Kafka
  • How to Design Event Streams, Part 2
  • How to Design Event Streams, Part 1

Trending

  • Optimizing Serverless Computing with AWS Lambda Layers and CloudFormation
  • Optimizing Software Performance for High-Impact Asset Management Systems
  • Designing AI Multi-Agent Systems in Java
  • Intro to RAG: Foundations of Retrieval Augmented Generation, Part 1
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Kafka Streams

Kafka Streams

An introductory look at Kafka Streams, and how developers can use this tool to better handle large amounts of data in their applications.

By 
Gaurav Garg user avatar
Gaurav Garg
·
May. 31, 18 · Tutorial
Likes (10)
Comment
Save
Tweet
Share
14.5K Views

Join the DZone community and get the full member experience.

Join For Free

In my previous article, I explained an example of a Kafka producer and consumer. In this article, I will explain the usage of Kafka Streams.

With Kafka producers and consumers you can create records and consume those records, but you cannot analyze them. For analyzing the data you need some other application, like Storm. So for analyzing the data you have to be dependent on another application. For this Kafka, provides the Streams API.

Kafka Streams is a client library to process and analyze the data stored in Kafka. Below are some highlights of Kafka Streams.

  • Simple and lightweight client library.

  • No external dependency on any system other than Apache Kafka.

  • Fault-tolerant local state. Local state is required to perform stateful operations like join, etc.

  • Supports exactly once processing semantic, i.e. one record will be processed only once.

  • Supports event-time based windowing operations.

  • Provides a high-level DSL and a low-level processor API to define the topology.

Key Concepts of Kafka Streams 

  • Kafka Streams represents an unbounded, continuously updating dataset of immutable records where each record is defined as a key-value pair.

  • A Kafka Streams processing application defines its computational logic through one or more processor topologies, where a processor topology is a graph of stream processors (nodes) that are connected by streams (edges).

  • A Streams processor is a node in a Streams topology. It receives a record from a topic or it's upstream processor and produces one or more records and write these records into a Kafka topic or downstream processor.

  • Source Processor: In a stream topology, a source processor node consumes records from one or more Kafka topics and produce these records to a Kafka topic or it's downstream processor nodes. It does not receive records from processor nodes, i.e. it is at the top of the hierarchy.

  • Sink Processor: A sink processor node receives records from an upstream processor node. It does not have any downstream processor nodes and, after processing records, it writes these records back into a Kafka topic.

Kafka Streams provides two ways to define streaming topologies.

  • Kafka Streams DSL: It provides inbuilt functions for data transformations. For example: map, mapValues, filter, selectKey, flatMap, flatMapValues, merge, branch. 

    • map: Transforms each record of the input stream into a new record of the output stream.

    • mapValues: Creates a new stream which keeps all the input stream's records with the same key but changes the value.

    • selectKey: Creates a new stream which keeps all the input stream's records with the same value but changes the key.

    • filter: Creates a new stream which contains all the input stream's records that satisfy a given predicate.

    • flatMap: Transform each record of the input stream into zero or more records in the output stream. Both the key and values can be changed in output records.

    • flatMapValues: Transform each record of the input stream into zero or more records in the output stream. For all output records, the key will be the same as the input record, so only the value can be changed.

    • merge: Merge two streams.

    • branch: This function takes an array of predicates as an input and creates an array of kstream. If for an input record, any predicate, let's say index 5, is valued true, then this record will be assigned to the output stream at index 5.

There are many other operations, like join, aggregate, etc.

KStream<String, Transaction> source = builder.stream("sourcetransaction");
KStream<String, Transaction> possibleFraudlentTransaction =source.filter(new Predicate<String, Transaction>() {
    @Override
    public boolean test(String key, Transaction value) {
if(value.getAmount() > 10000){
        return true;
        }

    }
});

In the above example, a stream is created from the topic sourcetransaction, and, for each record, we chick if its amount is greater than 10000. If if is, then those records are filtered.

  • Processor API: This API enables developers to write their custom records processors and connect them and interact with state stores. Below are some common functions of processors.

    • init: This function is called when Kafka Streams is initializing tasks. This function provides a ProcessorContext.

    • forward: This function forwards the key-value pair to the downstream processors.                For example: processingContext.forward(key,value) 

    • schedule: Schedule a periodic operation for a processor. This function can be called during the initialization of the processor or while processing. Time can be defined in two ways. For example, let's say we have defined interval as 1000 ms.

public class SourceProcessor implements Processor<String, Transaction> {
    private ProcessorContext context;

    public void init(ProcessorContext context) {
        this.context = context;
    }

    public void process(String key, Transaction value) {
      if(value.getAmount() > 10000){
        context.forward(key,value); // filter records and pass to next processor
        }
    }

    public void punctuate(long timestamp) {
       //Todo

    }

    public void close() {
//Todo
    }

}

Conclusion

Kafka Streams provides an easy way to process continuous data using Kafka Streams's DSL and processor API. Kafka Streams's DSL provides inbuilt functions to process records and perform windowing and aggregate operations, etc. Whether Processor API gives you more flexibility overwriting your own logic for processing records.

kafka Stream (computing) Record (computer science)

Opinions expressed by DZone contributors are their own.

Related

  • Introduction to Apache Kafka With Spring
  • Harnessing Real-Time Insights With Streaming SQL on Kafka
  • How to Design Event Streams, Part 2
  • How to Design Event Streams, Part 1

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!