DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • The Art of Logging: How to Write Effective Logs
  • When Events Move Faster Than Your Database: A Resilient Design Pattern
  • Production Database Migration or Modernization: A Comprehensive Planning Guide [Part 1]
  • Graph Databases and Baseball

Trending

  • Real-Time AI Inference at Scale Using Cloud Run, GPUs, and Vertex AI
  • Key Takeaways From Integrating a RAG Application With LangSmith
  • Beyond Conversation: Mastering Context with Claude Code Skills and Agents
  • YOLOv5 PyTorch Tutorial
  1. DZone
  2. Data Engineering
  3. Databases
  4. The Dual Write Problem: What Looks Safe in Code but Breaks in Production

The Dual Write Problem: What Looks Safe in Code but Breaks in Production

The dual write problem is one of the most common consistency issues in distributed systems. There are four patterns to resolve this.

By 
Vineet Bhatkoti user avatar
Vineet Bhatkoti
·
Apr. 29, 26 · Analysis
Likes (1)
Comment
Save
Tweet
Share
2.0K Views

Join the DZone community and get the full member experience.

Join For Free

A system that crashes is easier to fix than one that silently produces wrong results. The dual write problem is exactly that kind of bug.

It is surprisingly common and often misunderstood, even by teams that have encountered it in production. Understanding the dual write problem starts with seeing why the obvious solution fails, and ends with four patterns that address it correctly.

The Dual Write Problem

The dual write problem occurs when a service needs to write to two separate systems as part of a single logical operation. The most common example in modern microservices is writing to a database and publishing an event to a message broker like Kafka.

Consider this Spring service:

Java
 
@Transactional
public void placeOrder(Order order) {
    orderRepository.save(order);
    kafkaTemplate.send("orders", order);
}


This looks safe. The @Transactional annotation is there. The order is saved, the event is sent. The transaction boundary is not what it appears to be.

The Transactional Risk

The @Transactional annotation wraps the database operation in a transaction. But Kafka is not part of that transaction. These are two entirely separate systems with no shared coordinator. The transaction commits or rolls back the database write. It has no knowledge of and no control over what Kafka does. 

Here are the two failure scenarios that expose the problem:

Scenario 1: The event is sent, but the database rolls back.

Java
 
@Transactional
public void placeOrder(Order order) {
    orderRepository.save(order);             // DB write succeeds
  
    kafkaTemplate.send("orders", order);     // Kafka confirms the message
    // Lets say JVM crashes here, before DB transaction commits
    // DB rolls back. Order does not exist
    // Kafka message is already out and the downstream consumers process an order
}


Downstream consumers receive and process an event for an order that does not exist in the database. The system is now inconsistent, and neither side knows it.

Scenario 2: The database commits, but the event is lost.

Java
 
@Transactional
public void placeOrder(Order order) {
    orderRepository.save(order);             // DB write succeeds
  
    kafkaTemplate.send("orders", order);     
    // Lets say Kafka broker crashes before message is durably written
    // DB commits successfully
    // Event is lost and downstream consumers never process the order
}


The order exists in the database, but no downstream service ever processes it. Inventory is never updated, the confirmation email is never sent, and the warehouse never picks up the item.
Both scenarios are real production failures. Both are silent. Neither produces an immediate error that would trigger an alert.

XA Transactions and Their Shortcomings

Java supports distributed transactions through XA (two-phase commit) via JTA (Java Transaction API). In theory, XA coordinates a transaction across multiple resources, including databases and message brokers. In practice, it has three fundamental problems:

  • Slow: Two-phase commit requires multiple round-trips between all participants before a transaction can complete, adding significant latency to every operation.
  • Fragile: If the transaction coordinator crashes between the prepare and commit phases, resources are left in an uncertain state that requires manual intervention to resolve.
  • Incompatible with Kafka: Kafka's transactional model is internal to Kafka itself and does not participate in a JTA-coordinated distributed transaction.
    XA is not a viable solution for modern event-driven architectures.

The Four Patterns That Address the Problem

1. The Transactional Outbox Pattern

Instead of writing directly to Kafka, write the event to an outbox table in the same database transaction as the primary data. A separate background publisher then reads the outbox table and publishes events to Kafka.

Java
 
@Transactional
public void placeOrder(Order order) {
    orderRepository.save(order);
    outboxRepository.save(new OutboxEvent("orders", order));
    // Both writes are in the same DB transaction
}


A background publisher reads the outbox and sends it to Kafka:

Java
 
@Scheduled(fixedDelay = 100)
public void publishOutboxEvents() {
    List<OutboxEvent> events = outboxRepository.findUnpublished();
    events.forEach(event -> {
        kafkaTemplate.send(event.getTopic(), event.getPayload());
        outboxRepository.markPublished(event);
    });
}


This guarantees at least one delivery. Downstream consumers must be idempotent to handle potential duplicates.

2. Change Data Capture (CDC)

Rather than writing to an outbox table manually, CDC monitors the database transaction log directly and forwards changes to Kafka automatically.

The database write is the only operation the application performs. CDC captures the change from the transaction log and publishes it to Kafka asynchronously. The application does not know Kafka at all.

This is the cleanest separation of concerns but introduces an operational dependency on a CDC pipeline that must be maintained and monitored.

3. Event Sourcing

Event sourcing eliminates the dual write problem at its root by making the event the source of truth. Instead of writing the state to a database and publishing an event, only the event is stored. The current state of an entity is derived by replaying its event history.

A common mistake is to call eventBus.publish directly after appending to the event store:

Java
 
public void placeOrder(Order order) {
    OrderPlacedEvent event = new OrderPlacedEvent(order);
    eventStore.append(event);  // succeeds
    eventBus.publish(event);   // fails and dual write problem reintroduced
}


If eventBus.publish fails after the event store write succeeds, the dual write problem is reintroduced. The event is stored, but downstream consumers never receive it.

The correct approach is to never publish directly from the application. A separate background publisher reads from the event store and publishes to the message broker:

Java
 
public void placeOrder(Order order) {
    OrderPlacedEvent event = new OrderPlacedEvent(order);
    eventStore.append(event);  // single write, single system
}

// Background publisher reads from event store and publishes
@Scheduled(fixedDelay = 100)
public void publishEvents() {
    List<Event> unpublished = eventStore.findUnpublished();
    unpublished.forEach(event -> {
        eventBus.publish(event);
        eventStore.markPublished(event);
    });
}


There is only one write target in the application layer. The trade-off is a significant increase in architectural complexity and a learning curve for teams unfamiliar with event-sourced systems.

4. The Listen to Yourself Pattern

This pattern inverts the usual flow. Instead of writing to the database first and then publishing an event, the service publishes the event to Kafka first and listens to its own event to update its own state.

Java
 
// Publish the event first
public void placeOrder(Order order) {
    kafkaTemplate.send("orders", order);
}

// Listen to own event and update database state
@KafkaListener(topics = "orders")
public void onOrderPlaced(Order order) {
    orderRepository.save(order);
}


Once the event is confirmed by Kafka, it will not be lost even if the service crashes immediately after. When the service restarts, it will consume the event from Kafka and update the database. The trade-off is that reads immediately after a write may not reflect the latest state, and the listener must be idempotent to handle duplicate deliveries.

Choosing the Right Pattern

Each pattern solves the dual write problem but with different trade-offs. Choosing the right one depends on the complexity the team can sustain and the consistency guarantees the system requires.

Pattern Complexity Delivery Guarantee Best For

Transactional Outbox

Low

At least once

Most microservices

CDC

Medium

At least once

High-throughput systems

Event Sourcing

High

At least once

Compliance-driven systems

Listen to Yourself

Low

At least once

Simple event-driven flows


For most use cases, the Transactional Outbox Pattern is the right starting point. It is straightforward to implement, works with any database, and requires no additional infrastructure beyond a background publisher.

Conclusion

The dual write problem is one of those issues that sits quietly in a codebase until a precisely timed failure exposes it. The @Transactional annotation provides a false sense of safety when a second external system is involved. Understanding why transactions do not cross system boundaries is fundamental knowledge for anyone building event-driven distributed systems. Knowing the patterns that address this is equally important.

The code that looks safe often is not. The fix is not complex. But it requires knowing the problem exists in the first place.

Database Database transaction Event Production (computer science)

Opinions expressed by DZone contributors are their own.

Related

  • The Art of Logging: How to Write Effective Logs
  • When Events Move Faster Than Your Database: A Resilient Design Pattern
  • Production Database Migration or Modernization: A Comprehensive Planning Guide [Part 1]
  • Graph Databases and Baseball

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook