DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Why Queues Don’t Fix Scaling Problems
  • Resilient Kafka Consumers With Reactor Kafka
  • Design Twitter Like Application Using Lambda Architecture
  • Kafka-Streams - Tips on How to Decrease Re-Balancing Impact for Real-Time Event Processing On Highly Loaded Topics

Trending

  • Why SAP S/4HANA Landscape Design Impacts Cloud TCO More Than Compute Costs
  • Code Quality Had 5 Pillars. AI Broke 3 and Created 2 We Can’t Measure
  • Why Good Models Fail After Deployment
  • Observability in Spring Boot 4
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Exactly-Once Processing: Myth vs Reality

Exactly-Once Processing: Myth vs Reality

Exactly-once is a collection of local guarantees, not an end-to-end guarantee, and real systems rely on idempotency and at-least-once guarantees.

By 
Irullappan irulandi user avatar
Irullappan irulandi
·
May. 26, 26 · Analysis
Likes (0)
Comment
Save
Tweet
Share
286 Views

Join the DZone community and get the full member experience.

Join For Free

Exactly-once processing (EOP) is often touted as the gold standard for reliability in distributed systems. The promise of processing each message just once seems perfect, whether you're developing financial systems, real-time analytics pipelines, or event-driven microservices.

But the truth is much more complex. What most systems refer to as "exactly once" is actually an approximation that balances trade-offs, limitations, and assumptions rather than an absolute.

To help you create systems that are accurate, robust, and practical, this article dissects exactly once processing. 

Exactly-once processing

The Myth

The myth is straightforward: 

  • The entire program must be executed exactly once if the broker supports exactly once semantics. 
  • Similarly, it is frequently assumed that the entire pipeline ensures exactly-once processing if a stream processor can restore state from checkpoints. 
  • Another widespread misconception is that, the user-visible result must likewise have happened exactly once if a database transaction commits only once. 

When analyzed throughout a whole, real-world system, none of these claims actually hold.

The Reality

This is a more truthful statement: 

  • A broker can provide exactly-once writes between producers and broker-managed topics. 
  • Within its checkpoint boundary, a stream processor can provide exactly-once state transitions. 
  • One local transaction can be made atomic by a database. 
  • When idempotency and replay-safety are implemented across all boundaries, an end-to-end business workflow is effective only once.

That goes beyond semantics. It is designed for failure.

Three Important Questions

Three distinct questions should be asked in a useful design review: 

  1. Is it possible to send the same input more than once? 
  2. Is it possible to try the same calculation more than once? 
  3. Is it possible for a business effect to occur more than once? 

The approach is typically effective if the first two questions are answered affirmatively, but the third is answered negatively.

Definition

Before debating "exactly once," precisely clarify the meanings at each layer:

Semantic Definition
At-most-once A message may be lost, but once acknowledged it is not retried. No duplicates, possible data loss.
At-least-once A message is retried until acknowledged. No data loss, but duplicates are possible.
Exactly-once within a boundary A subsystem guarantees one logical input results in one durable state transition relative to its recovery model.
Effectively-once Duplicate deliveries may occur, but the final business effect is idempotent, so users observe the action exactly once.


The first hard fact is that exactly once is nearly always local rather than global.

Where The Boundaries Actually Break

 Where the boundaries actually break

Local exactly-once guarantees stop at subsystem boundaries unless the next hop is also replay-safe.

Acknowledgment, retry, timeout, crash recovery, replay, and deduplication are concepts unique to each boundary. It is not a given that a strong assurance at one boundary would automatically compose with the next. 
Boundary Mechanism Guarantee Level Failure Risk
Producer to Broker Producer ID + sequence numbers Idempotent append Limited to broker partitions
Broker to Consumer Consumer group offsets At-least-once delivery Crash leads to replay
Consumer to DB ACID transaction + unique key Idempotent sink write No dedupe causes duplicates
DB to Outbox Same ACID transaction Atomic state + event Relay may republish
Outbox to Downstream Broker Event ID deduplication by consumer At-least-once publish Consumer must handle dedupe
Service to External API Idempotency key per request Effectively-once Timeout creates uncertainty


Every boundary has its own notion of acknowledgment, retry, timeout, crash recovery, replay, and deduplication.

Why Are Duplicates Common Rather Than Unusual

Recovery behavior typically results in duplicates: 

  • After a transmit timeout, a producer attempts again because it cannot distinguish between failure and success. 
  • Before the offset acknowledgment, a customer writes to a sink and crashes. 
  • A processor reruns a portion of the input after restoring from a checkpoint. 
  • Because the publish acknowledgment was not observed, a relay republishes an outbox record. 
  • Even though the remote service has already applied the request, an HTTP client attempts again after the timeout.

None of those actions is strange. They are precisely what well-behaved distributed software accomplishes under uncertainty.

Technical Reality at Each Layer 

Producer Semantics

At the producer layer, exactly once usually means idempotent append behavior to the broker, not magical downstream correctness. For a broker such as Kafka, an idempotent producer can use:

  • Producer id 
  • Sequence numbers 
  • Partition-level ordering 
  • Broker-side dedupe of retried sends  

A minimal producer configuration seems like:

Java
 
enable.idempotence=true
acks=all
retries=Integer.MAX_VALUE
max.in.flight.requests.per.connection=5


This is important. It stops a common type of duplicate appends caused by producer retries. However, it only discusses records that were written to Kafka partitions. It makes no mention of:

  • A consumer writing twice to PostgreSQL 
  • A webhook firing twice 
  • An email being sent twice

Kafka Producer Details: Idempotence Versus Transactions

Kafka provides you with two distinct tools that are frequently mistakenly combined into a single idea: 

  • Idempotent producer semantics 
  • Transactional producers Semantics  

Retried appends to a partition benefit from idempotence. Multiple writes and offset commits can be combined into a single broker-visible unit with the aid of transactions. 

The following are typical transactional producer settings:

Java
 
enable.idempotence=true
transactional.id=orders-service-1
acks=all


And the application flow is conceptually:

Java
 
producer.initTransactions();
producer.beginTransaction();
producer.send(outputRecord1);
producer.send(outputRecord2);
producer.sendOffsetsToTransaction(offsets, consumerGroupMetadata);
producer.commitTransaction();


This resolves the issue: output records and ingested offsets can become atomically visible simultaneously within Kafka if the application consumes from Kafka and produces back to Kafka.

But take note of what's absent:

  • No database write is included 
  • No webhook is included 
  • No Redis mutation is included 
  • No email is included  

Kafka transactions are genuine, but their scope is Kafka-managed state rather than arbitrary external state. 

Broker and Consumer Semantics

When the customer's progress marker is behind the durable write, it has already completed, the broker may redeliver a record. 

The classic race is: 

  1. Consumer reads message 
  2. Consumer writes to sink 
  3. Process crashes before offset commit 
  4. Consumer restarts 
  5. Broker redelivers

The business effect occurs twice if the sink write is not idempotent.

For this reason, "broker exactly once" and "application exactly once" are not interchangeable terms.

Kafka Consumer Details: Offsets Are Not Business State

The consumer offset as evidence that the business activity occurred exactly once is the root cause of many application issues. It isn't. The offset is only evidence of the consumer group's advancement. 

These are distinct facts:

  • The broker knows a record was consumed by a consumer group 
  • The application knows a sink mutation was committed 
  • The business knows the effect should be visible once    

Only when the application specifically aligns those facts will they do so.
Additionally, "isolation.level=read_committed" is frequently misinterpreted on the read side. It stops customers from seeing transactional writes that Kafka producers have aborted. Although that is helpful, not all downstream effects are transformed into exactly one.

It simply means:

  • Committed Kafka transactions are visible 
  • Aborted Kafka transactions are hidden

It does not mean: 

  • Downstream database writes cannot duplicate 
  • External side effects cannot duplicate 
  • Application retries are automatically idempotent

Because of this, consumer accuracy nearly always necessitates one of: 

  • Idempotent sink keys 
  • Inbox dedupe tables 
  • Sink-side upserts 
  • Compare-and-set version checks 
  • Business-key uniqueness constraints

The Honest Way to Document Guarantees

Rather than writing "the system is exactly once," use scoped statements for each boundary in your documentation:

Boundary Guarantee Statement
Producer to Kafka Producer writes to Kafka are idempotent per partition.
Processor State State transitions are exactly-once relative to checkpoint recovery.
Primary Database Writes Writes are idempotent by business key and message ID.
Outbox Publication Publication is at-least-once; downstream consumers must dedupe by event ID.
Webhook / SaaS APIs Integrations rely on idempotency keys and replay-safe receivers.
End-to-End Behavior Business behavior is effectively-once under replay and recovery.


Final Thoughts 

If you make the promise specific and local, exactly once processing is not a fiction.
Within clearly specified constraints, databases, Kafka, Flink, and other subsystems can unquestionably offer exactly-once guarantees.

The misconception is that the local assurances automatically combine to form a single end-to-end promise that spans brokers, processors, databases, outboxes, caches, webhooks, and external APIs.

The robust architecture in real systems is typically:

  • At-least-once delivery 
  • Transactional local state 
  • Idempotent sinks 
  • Outbox or inbox patterns 
  • Idempotency keys for side effects 
  • Explicit documentation of every guarantee boundary  

It is the version that endures node draining, broker failovers, replay after restore, and unclear network timeouts, but it is less commercially viable than stating "exactly once everywhere."

kafka Processing

Opinions expressed by DZone contributors are their own.

Related

  • Why Queues Don’t Fix Scaling Problems
  • Resilient Kafka Consumers With Reactor Kafka
  • Design Twitter Like Application Using Lambda Architecture
  • Kafka-Streams - Tips on How to Decrease Re-Balancing Impact for Real-Time Event Processing On Highly Loaded Topics

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook