Kafka Producer Delivery Semantics
We take a deep dive into the various delivery semantics available for use in Apache Kafka.
Join the DZone community and get the full member experience.
Join For FreeThis article is a continuation of Part 1, Kafka Technical Overview and Part 2, Kafka Producer Overview articles. Let's look into different delivery semantics and how to achieve them using producer and broker properties.
Delivery Semantics
Based on broker and producer configuration, all three delivery semantics— “at most once”, “at least once” and “exactly once” — are supported.
At Most Once
In an 'at most once' delivery semantics, a message should be delivered maximum only once. It's acceptable to lose a message rather than delivering a message twice in this semantic. A few use cases of at most once includes metrics collection, log collection, and so on. Applications adopting at most semantics can easily achieve higher throughput and low latency.
At Least Once
In 'at least once' delivery semantics it is acceptable to deliver a message more than once but no message should be lost. The producer ensures that all messages are delivered for sure, even though it may result in message duplication. This is the most preferred semantics system out of them all. Applications adopting at least once semantics may have moderate throughput and moderate latency.
Exactly Once
In 'exactly one' delivery semantics, a message must be delivered only once and no message should be lost. This is the most difficult delivery semantic of all. Applications adopting exactly once semantics may have lower throughput and higher latency than the other two semantic systems we've looked at.
Delivery Semantics Summary
The table below summarizes the behavior of all delivery semantics.
Producer Delivery Semantics
Different delivery semantics can be achieved in Kafka using the Acks property of the producer and the min.insync.replica
property of the broker (considered only when acks = all).
Acks = 0
When the acks property is set to zero you get at most once delivery semantics. Kafka producer sends the record to the broker and doesn't wait for any response. Messages, once sent, will not be retried in this setting. The producer uses the “send and forget approach” with acks = 0.
Data Loss
In this mode, the chances of data loss occurring are high, as the producer does not confirm the message was received by the broker. The message may not have even reached the broker or broker failure soon after message delivery can result in data loss.
Acks = 1
When this property is set to 1 you can achieve at least once delivery semantics. A Kafka producer sends the record to the broker and waits for a response from the broker. If no acknowledgment is received for the message sent, then the producer will retry sending the messages based on a retry configuration. The retries property, by default, is set to 0; make sure this is set to the desired number or Max.INT.
Data Loss
In this mode, chances for data loss are moderate as the producer confirms that the message was received by the broker (leader partition). As the replication of the follower partition happens after the acknowledgment this may still result in data loss. For example, after sending the acknowledgment and before the replication, if the broker goes down this may result in data loss as the producer will not resend the message.
Acks = All
When the acks property is set to all, you can achieve exactly once delivery semantics. The Kafka producer sends the record to the broker and waits for a response from the broker. If no acknowledgment is received for the message sent, then the producer will retry sending the messages based on the retry config being set to n. The broker sends acknowledgment only after replication based on the min.insync.replica
property.
For example, a topic may have a replication factor of 3 and a min.insync.replica
of 2. In this case, an acknowledgment will be sent after the second replication is complete. In order to achieve exactly once delivery semantics the broker has to be idempotent. Acks = all should be used in conjunction with min.insync.replicas
.
Data Loss
In this mode, the chances of data loss are low as the producer confirms that the message was received by the broker (leader and follower partition) only after replication. As the replication of the follower partition happens before the acknowledgment of data loss, the chances of actually losing data are minimal. For example, before replication and sending acknowledgment, if the broker goes down, the producer will not receive the acknowledgment and will send the message again to the newly elected leader partition.
Exception
When there are not enough nodes to replicate as per the min.insync.replica
property, then the broker will return an exception instead of an acknowledgment.
Safe Producer
In order to create a safe producer that ensures minimal data loss, use the below producer properties.
Producer Properties
- Acks = all (default 1) – Ensures replication before acknowledgement.
- Retries = MAX_INT (default 0) – Retry in case of exceptions.
- Max.in.flight.requests.per.connection = 5 (default) – Parallel connections to broker.
Broker Properties
- Min.insync.replicas = 2 (at least 2) – Ensures a minimum In Sync Replica (ISR).
Acks Impact
The table below summarizes the impact of acks property on latency, throughput, and durability.
Summary
Configure Kafka producers and brokers to achieve the desired delivery semantics based on the following properties.
- Acks
- Retries
- Max.in.flight.requests.per.connection
- Min.insync.replicas
In Part 4 of the series, we'll look into Kafka consumers, consumer groups, and how to achieve different Kafka consumer delivery semantics.
Opinions expressed by DZone contributors are their own.
Comments