DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Setting Up Local Kafka Container for Spring Boot Application
  • Node.js Walkthrough: Build a Simple Event-Driven Application With Kafka
  • Automated Application Integration With Flask, Kakfa, and API Logic Server
  • Resilient Kafka Consumers With Reactor Kafka

Trending

  • A Developer's Guide to Mastering Agentic AI: From Theory to Practice
  • Unlocking AI Coding Assistants Part 4: Generate Spring Boot Application
  • Unlocking the Benefits of a Private API in AWS API Gateway
  • Breaking Bottlenecks: Applying the Theory of Constraints to Software Development
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Interpreting Kafka's Exactly-Once Semantics

Interpreting Kafka's Exactly-Once Semantics

Apache Kafka comes out-of-the-box packed with great data science features. We take a look at few of them in this post.

By 
Rahul Agarwal user avatar
Rahul Agarwal
·
Jan. 01, 19 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
27.5K Views

Join the DZone community and get the full member experience.

Join For Free

Until recently most organizations have been struggling to achieve the holy grail of message delivery, the exactly-once delivery semantic. Although this has been an out-of-the-box feature since Apache Kafkas 0.11, people are still slow in picking up this feature. Let's take a moment in understanding exactly-once semantics. What is the big deal about it and how does Kafka solve the problem?

Apache Kafka offers following delivery guarantees. Let's understand what this really means:

  • At Most Once Delivery: It guarantees that a particular message can either be delivered once or not at all. There can be a loss of messages but a message can never be delivered more than once.

  • At Least Once Delivery: It guarantees that a particular message will always be delivered. It can be delivered multiple times but there will never be any messages lost.

  • Exactly Once Delivery: It guarantees that all messages will always be delivered exactly once. Exactly once does not mean that there will be no failures or no retries. These are inevitable. The important thing is that the retries succeed. In other words, the result should be the same, whether it has been successfully processed exactly once or not.

Why Exactly-Once Is Important

There are certain use cases (like financial applications, IoT applications, and other streaming applications) which cannot afford anything less than exactly-once. You cannot afford to have duplicates or lose messages when depositing or withdrawing money from a bank account. It needs exactly-once as a final outcome.

Why it Is Difficult to Achieve

Assuming you have a small Kafka stream application with few inputs feeding partitions and few output partitions. The intent and expectation of the application is to receive data from input partitions, process the data, and write the same to output partitions. This is where one wants to achieve exactly-once as a guarantee. There are scenarios due to network glitches, system crashes, and other errors where duplicates get introduced during the process.

Problem 1: Duplicate or Multiple Writes

Refer to Figure 1a. Message m1 is being processed and being written to Topic B. Message m1 gets successfully written to Topic B (as m1') but the acknowledgment is not received. The reason could be, let's say, network delay and this eventually gets timed out.Image title

Figure 1a: Duplicate write problem.


Image title

Figure 1b: Duplicate write problem due to retry.

Since the application does not know that the message is already successfully written, as it never received the acknowledgment, it retries and leads to a duplicate write. Refer to Figure 1b. Message m1' gets rewritten to Topic B. This is a duplicate write issue and needs to be fixed.

Problem 2: Reread Input Record

Image title

Figure 2a: Reread problem due to the application crashing.

Image title

Figure 2b: Reread problem when the crashed application restarts.

Refer to Figure 2a. We have the same scenario as above, but, in this case, the stream application crashes just before committing the offset. Since the offset is not committed, when the stream application comes up again it rereads message m1 and processes the data again (Figure 2b). This again leads to duplicate writes of message m1 in Topic B.

How Apache Kafka Helps

Apache Kafka solves the above problems via exactly-once semantics using the following.

Idempotent Producer

Idempotency on the producer side can be achieved by preventing messages from being processed multiple times. This is achieved by persisting the message only once. With idempotency turned on, each Kafka message gets two things: a producer id (PID) and sequence number (seq). The PID assignment is completely transparent to users and is never exposed by clients.

Image title

Figure 3: Idempotent Producer

producerProps.put("enable.idempotence", "true");
producerProps.put("transactional.id", "100");

In the case of broker failure or client failure, during retry of message send, the topic will only accept messages that have a new unique sequence number and producer id. The broker automatically deduplicates any message(s) sent by this producer, ensuring idempotency. No additional code changes are required.

Transactions Across Partitions

To ensure that each message gets processed exactly-once, transactions can be used. Transactions have an all-or-nothing approach. They ensure that after picking a message, that the message can be transformed and atomically written to multiple topics/partitions along with an offset of the consumed message.

Code Snippet for Atomic Transactions

producer.initTransactions();
try {

producer.beginTxn(); 
 // ... read from input topic
 // ... transform

producer.send(rec1); // topic A 
producer.send(rec2); // topic B 
producer.send(rec3); // topic C 

producer.sendOffsetsToTxn(offsetsToCommit, “group-id”);

  producer.commitTransaction();
} catch ( Exception e ) {
  producer.abortTransaction();
}

Apache Kafka v0.11 introduced two components — the Transaction Coordinator and Transaction Log — which maintain the state of the atomic writes.

The below diagram details a high-level flow of events that enables atomic transactions across various partitions:

Image title

Figure 4: Transactions across a partition.

  1. initTransactions() registers a transactional.id with the coordinator.
  2. The coordinator bumps up the epoch of the PID so that previous instance of that PID is considered a zombie and fenced off. No writes in the future are accepted from these zombies.
  3. The producer adds a partition with the coordinator when the producer is about to send data to a partition.
  4. The transaction coordinator keeps the state of each transaction it owns in memory, and also writes that state to the transaction log (partition information, in this case).
  5. The producer sends data to the actual partitions.
  6. The producer initiates a commit transaction and, as a result, the coordinator begins the two-phase commit protocol.
  7. This is where the first phase begins and the coordinator updates the transaction log to “prepare_commit”.
  8. The coordinator then begins Phase 2, where it writes the transaction commit markers to the topic-partitions which are part of the transaction.
  9. After writing the markers, the transaction coordinator marks the transaction as “committed.”

Transactional Consumer

If a consumer is transactional, we should use the isolation level, read_committed. This ensures that it reads only committed data.

The default value of isolation.level is read_uncommitted.

This just a high-level view of how transactions work in Apache Kafka. I'd recommend exploring the docs if you're interested in taking a deeper dive.

Conclusion

In this post, we talked about various delivery guarantee semantics such as at-least-once, at-most-once, and exactly-once. We also talked about why exactly-once is important, the issues in the way of achieving exactly-once, and how Kafka supports it out-of-the-box with a simple configuration and minimal coding.

References

  1. Transaction in Apache Kafka

  2. Enabling-exactly-kafka-streams

kafka Semantics (computer science) application Transaction log

Opinions expressed by DZone contributors are their own.

Related

  • Setting Up Local Kafka Container for Spring Boot Application
  • Node.js Walkthrough: Build a Simple Event-Driven Application With Kafka
  • Automated Application Integration With Flask, Kakfa, and API Logic Server
  • Resilient Kafka Consumers With Reactor Kafka

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!