Solving Real-Time Event Correlation in Distributed Systems
Learn how to join streaming events in real time using Spring Boot with event-time windows, watermarking, and in-memory state — no Kafka or Flink needed.
Join the DZone community and get the full member experience.
Join For FreeModern digital platforms operate as distributed ecosystems — microservices emitting events, APIs exchanging data, and asynchronous communication becoming the norm. In such environments, correlating events across multiple sources in real time becomes a critical requirement.
Think of payments, orders, customer metadata, IoT sensors, logistics tracking — all flowing continuously.
These events rarely arrive in order, rarely originate from one system, and often must be joined before downstream processing. This leads to a crucial question:
How do we join two streaming event sources (left and right streams) by key and time window using plain Spring Boot, without Kafka Streams, Spark, or Flink?
This blog explains a clean, production-ready way to implement:
- Real-time stream join
- Event-time based correlation
- Watermarking to drop late events
- Sliding/Tumbling window joins
- In-memory state store
- REST API-based ingestion
Why Do We Need a Streaming Join Engine?
In enterprise systems, operations often depend on combining information from multiple event producers.
1. Order + Customer Metadata
- Left stream: Order event from OMS.
- Right stream: Customer country/geo from User Service. This is needed for real-time enrichment before invoicing or fraud detection.
2. Payment Event + Fraud Score Event
- Payment attempt arrives.
- Fraud score arrives from ML model. Join is required within 5–10 seconds to allow real-time approval decisions.
3. IoT Sensor Fusion
- Temperature event
- Humidity event. Must be joined within milliseconds for accurate predictions.
4. Telecom Real-Time Monitoring
- Subscriber movement.
- Tower signal. Join is needed for real-time network-performance metrics.
These joins cannot rely on nightly batch jobs, because decisions must be:
- Real-time
- Event-driven
- Low-latency
- Correct, even if events arrive out of order
Core Challenge: Event-Time Disorder and Late Arrivals
In real systems:
- Network jitter delays events
- Services are produced at different speeds
- Clock skew causes timestamp variations
- Event bursts cause uneven flow
- Events can arrive seconds to minutes late
A naïve "join the last event you saw" strategy will break.
Example
Left Event (Order)
- t = 10:00:05
Right Event (CustomerMeta)
- t = 10:00:02 (arrives late at 10:00:08)
→ Without event-time logic, this join fails.
→ Business logic says it should match.
This is why we need:
- Event-time based windows
- Watermarking
- Late-event handling
Architecture: Event-Time Join Engine With Watermarking

Event Models
Left Event
public class LeftEvent {
private String key;
private long eventTime; // epoch millis
private String orderId;
private double amount;
}
Right Event
public class RightEvent {
private String key;
private long eventTime; // epoch millis
private String country;
}
How the Join Works: Step-by-Step
1. Accept Event via REST
@PostMapping("/left") → engineService.acceptLeft(e);
@PostMapping("/right") → engineService.acceptRight(e);
Events are added to:
- leftStore
- rightStore
2. Update Maximum Event Time
maxEventTime = max(maxEventTime, event.eventTime)
3. Compute Watermark
Allowed lateness = 10 seconds → 10000 ms
watermark = maxEventTime - 10000
Events older than this are discarded.
4. Perform Join
On the arrival of a new Left Event:
scan RightStore[key]
match if |leftTime - rightTime| <= joinWindow
Same for Right Event.
5. Emit Joined Output
A sample join result:
{
"key": "CUST-1002",
"orderId": "ORD-9002",
"amount": 1989.75,
"country": "India",
"joinedAt": 1700000105000
}
6. Cleanup Using Watermark
Any event older than the watermark is removed from memory.
Example API Requests and Outputs
POST /api/left
{
"key": "CUST-1002",
"eventTime": 1700649000202,
"orderId": "ORD-9002",
"amount": 1989.75
}
Response: accepted.

State:
leftStore["CUST-1002"] = [{ORD-9002, 1989.75}]
rightStore["CUST-1002"] = []
POST /api/right
{
"key": "CUST-1002",
"eventTime": 1700649000152,
"country": "India"
}

Joined Output
{
"key": "CUST-1002",
"orderId": "ORD-9002",
"amount": 1989.75,
"country": "India",
"joinedAt": 1700000105000
}
Watermark Example
- Max event time: 1700000105000
- Allowed lateness: 10s
- Watermark: 1700000095000
API endpoint: GET /api/watermark
Response: 1700000095000
REST Controller (Final Version)
@RestController
@RequestMapping("/api")
public class EventController {
private final EngineService engineService;
public EventController(EngineService engineService) {
this.engineService = engineService;
}
@PostMapping("/left")
public ResponseEntity<?> postLeft(@RequestBody LeftEvent e) {
engineService.acceptLeft(e);
return ResponseEntity.ok().body("accepted");
}
@PostMapping("/right")
public ResponseEntity<?> postRight(@RequestBody RightEvent e) {
engineService.acceptRight(e);
return ResponseEntity.ok().body("accepted");
}
@GetMapping("/watermark")
public ResponseEntity<?> watermark() {
return ResponseEntity.ok().body(engineService.getWatermark());
}
}
Endpoints
| Endpoint | description |
|---|---|
|
POST /api/left |
Ingest Left events |
|
POST /api/right |
Ingest Right events |
|
GET /api/watermark |
Shows computed watermark |
Why Not Use a Database JOIN?
Because database joins are:
- Based on stored/batch data
- High latency (seconds to minutes)
- Order-dependent (arrival order matters)
- Cannot handle late events.
- Cannot scale for streaming workloads
Our Spring Boot join engine provides:
- Millisecond joins
- Accurate event-time semantics
- In-memory performance
- Watermark-based cleanup
- No external streaming infrastructure
Practical Real-Time Use Case: E-Commerce Order Enrichment
Business Flow
1. Order Service emits:
- orderId
- amount
- eventTime
- customerId (key)
2. Customer Profile Service emits:
- customerId
- country
- eventTime
Business Need
Before routing an order to:
- Tax engine
- Fraud decisioning
- Payment gateway
- Invoicing pipeline
It must be enriched with the customer’s country.
Join Logic
- Join window: 10 seconds
- Join based on: event-time, not arrival time
Why?
- The customer profile may arrive late.
- Order service may be faster.
- Network latency varies.
- A message queue (Kafka/SQS) might delay only one stream.
Example Timeline
| Time (Actual) | event | arrival time |
|---|---|---|
|
10:00:02 |
RightEvent(country=“IN”) |
arrives at 10:00:05 |
|
10:00:05 |
LeftEvent(order=550) |
arrives at 10:00:05 |
Both timestamps fall within the 10-second window.
→ Join event created.
Real-Time Benefits
- Sub-second join processing. Accurate even when events arrive out of order.
- No Kafka, Flink, Spark required.
- Stateless REST producers → State held centrally.
- Runs on any environment (VM, on-prem, Kubernetes).
- Production-ready with watermark-based cleanup
Where This Is Used in the Real World
- Fraud detection: Join payment events with risk-score events.
- Real-time order enrichment: Join orders with customer metadata (country, geo, segment).
- Logistics/fleet tracking: Join GPS location with route updates.
- IoT sensor fusion: Combine temperature + humidity into a unified sensor reading.
- Telecom analytics: Join subscriber session with tower signal events.
- Online gaming: Join user action events with current session metadata.
Final Summary
You now have a complete:
- Real-time streaming join engine
- Event-time window correlation
- Watermarking and cleanup
- REST-based ingestion
- In-memory state store
- Real-world business use case
- Ready-to-deploy Spring Boot implementation
This is a lightweight, efficient alternative to Kafka Streams/Flink for small-to-medium-scale real-time joins.
Opinions expressed by DZone contributors are their own.
Comments