DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Event Driven Architecture (EDA) - Optimizer or Complicator
  • Event-Driven Architecture for Software Development: Leverage the Strength of Reactive Systems
  • Building an Event-Driven Architecture Using Kafka
  • What Are Events? Always 'Decoupled'

Trending

  • Reproducible Development Environments, One Command Away: Introducing CodingBooth
  • Why Your Test Automation Is Always Behind the Code And the Architecture That Fixes It
  • 5 Failure Patterns That Break AI Chatbots in Production
  • Skills, Java 17, and Theme Accents
  1. DZone
  2. Data Engineering
  3. IoT
  4. Event-Driven Architecture Patterns: Real-World Lessons From IoT Development

Event-Driven Architecture Patterns: Real-World Lessons From IoT Development

Real-world lessons on building event-driven systems for IoT and microservices — and model optimization with practical code examples.

By 
Bhavna Hirani user avatar
Bhavna Hirani
·
Dhruv Kulshrestha user avatar
Dhruv Kulshrestha
·
Nov. 10, 25 · Analysis
Likes (6)
Comment
Save
Tweet
Share
7.7K Views

Join the DZone community and get the full member experience.

Join For Free

Why This Matters for Back-End Developers

I spent six years working with microservices before I truly understood event-driven architecture. Building a real-time IoT system with more than 50 distributed nodes, using asynchronous messaging and meeting latency goals under 100 milliseconds, made it all click for me.

In this article, I’ll share practical patterns from real production experience that you can use for microservices, stream processing, and distributed systems. These ideas are useful whether you’re building IoT solutions, APIs, or data pipelines.

Event-Driven vs. Request-Response: Performance Impact

The Polling Problem

My first implementation polled 20 sensors every second:

Python
 
while True:
   states = [check_sensor(i) for i in range(20)]
   process_and_decide(states)
   time.sleep(1)


Results:

  • 60% constant CPU usage
  • 1-second latency for state changes
  • Tight coupling (adding sensors requires code changes)
  • Network saturation from constant polling

Pub-Sub Solution

Switched to MQTT publish-subscribe:

Python
 
# Sensors publish when state changes
sensor.on_change(lambda state:
   mqtt.publish(f"sensors/{sensor_id}", state))
# Consumers subscribe to relevant topics
mqtt.subscribe("sensors/kitchen/#", handle_event)


Results:

  • 5% idle CPU (15% during processing)
  • <100ms latency
  • 80% less network traffic
  • With this setup, you don’t have to update the code to add new sensors. The system stays flexible and easy to expand.

The main takeaway is that for event-driven workflows in microservices, the publish-subscribe pattern works much better than polling. This approach also applies to tools like Kafka, RabbitMQ, and AWS EventBridge.

Edge Architecture Principles

A local-first design means the main features run on local devices, while the cloud handles backup and analytics. This is important for systems that need to keep working even if the network fails, like retail POS, medical devices, or factory automation.

Making the most of your resources is important. When RAM and CPU are limited, you need to write efficient code. Learning to profile, optimize algorithms, and manage memory can also help you save money in the cloud.

Model compression techniques:

  • Quantization: float32 → int8 (4x reduction, <2% accuracy loss)
  • Pruning: Remove 60% of neural connections (5x reduction, <3% accuracy loss)
  • Using model compression techniques, I was able to shrink my machine learning model from 200 MB down to just 4 MB. Now, I use these methods for every model I put into production.

MQTT: Message Broker Patterns

MQTT has features that inspired modern message brokers. Key patterns relevant to backend development:

Quality of Service Per Message

Choose reliability level per message:

  • QoS 0: Fire-and-forget (metrics, logs)
  • QoS 1: At-least-once delivery (commands that can retry)
  • QoS 2: Exactly-once (financial transactions, critical commands)

With HTTP, every request is treated the same. MQTT lets you choose the right reliability level for each message, and this pattern is now used in NATS and Kafka as well.

Retained Messages for State

The broker keeps the last message for each topic. New subscribers get the current state right away, without needing database queries. This solves the problem of getting the initial state in distributed systems.

Last Will Testament

Clients can set the message broker to publish a message if they disconnect unexpectedly. This allows for automatic failure detection without heartbeat polling. It’s also used in Kubernetes liveness probes and service mesh setups.

Data Fusion: Combining Unreliable Sources

Common backend problem: multiple data sources with varying reliability. How to combine them for accurate insights?

Multi-Signal Aggregation

Single source: 85% accuracy. Combined sources: 95%+ accuracy.

Weighted scoring approach:

Python
 
confidence = sum(signal * weight for signal, weight in [
   (motion_detected, 0.3),
   (temperature_rising, 0.2),
   (lights_on, 0.15),
   (time_in_range, 0.2),
   (calendar_available, 0.15)
])
if confidence > 0.7:
   execute_action()


The same pattern is used in fraud detection, recommendation engines, and anomaly detection. Multiple weak signals create strong predictions.

Drift Detection

Sensors degrade over time. Statistical monitoring catches this:

Python
 
# Track rolling statistics
stats = compute_rolling_stats(sensor_readings, window_days=7)
# Detect distribution shifts
if stats.mean_shift > 2 * historical_std:
   flag_sensor_for_recalibration()
# Cross-validate with correlated sensors
if abs(sensor_a - median([sensor_b, sensor_c, sensor_d])) > threshold:
   reduce_weight(sensor_a)


You can use this approach to find degraded services, unreliable tests, and data quality issues. 

Consistent State Management

Distributed systems without centralized transactions require embracing eventual consistency.

Choreography Pattern

Single "movie_mode" event triggers multiple independent subscribers:

Python
 
# Publisher
event_bus.publish("mode.movie", {"timestamp": now()})
# Multiple subscribers react independently
@subscribe("mode.movie")
def living_room_lights():
   lights.set_brightness(20)
@subscribe("mode.movie")
def audio_system():
   audio.switch_to_tv()
@subscribe("mode.movie")
def blinds():
   blinds.close()


These actions happen at different times and speeds, and some may fail along the way.

Handling Failures

  • Idempotent commands: "Set brightness to 20," not "decrease by 50."
  • State verification: After the command, verify that the actual state matches the expected.
  • Timeouts: Mark failed after 5 seconds; continue with others.
  • Circuit breakers: After 3 consecutive failures, stop sending commands for a cooldown period.

This is how microservices should handle distributed workflows.

Circuit Breaker Implementation

Python
 
class CircuitBreaker:
   def __init__(self, failure_threshold=3, timeout=60):
       self.failure_count = 0
       self.state = "closed"  # closed, open, half_open
       self.last_failure_time = None

   def call(self, func):
       if self.state == "open":
           if time.time() - self.last_failure_time > self.timeout:
               self.state = "half_open"
           else:
               raise CircuitOpenError()
       try:
           result = func()
           self.on_success()
           return result
       except Exception as e:
           self.on_failure()
           raise e
   def on_failure(self):
       self.failure_count += 1
       if self.failure_count >= self.failure_threshold:
           self.state = "open"
           self.last_failure_time = time.time()
   def on_success(self):
       self.failure_count = 0
       self.state = "closed"


This helps stop cascading failures in distributed systems. It’s helpful for external APIs, database connections, and service-to-service calls.

Stream Processing Architecture

Processing 500-1000 events/minute on Raspberry Pi 4:

Events → Router → Aggregator → Predictor → Executor

  • Router: Filters and routes to interested subscribers 
  • Aggregator: Maintains sliding windows, detects patterns
  • Predictor: ML inference with confidence scores 
  • Executor: Implements retry logic and circuit breakers

This setup uses micro-batch processing to handle streaming data. It follows the same architecture patterns as Flink, Spark Streaming, and Kinesis, but scaled down for edge devices.

Observability Essentials

Once you’ve built the architecture, it’s important to keep an eye on system health. Here’s how you can do that:

Good:

Python
 
logger.info("automation_executed", extra={
   "automation_id": "morning_routine",
   "trigger_event": "motion_kitchen",
   "devices": ["lights", "coffee", "cabinet"],
   "success": True,
   "latency_ms": 87,
   "timestamp": datetime.utcnow().isoformat()
})


With this setup, you can run queries like "show failed automations" or "p95 latency by automation type."

Critical Metrics

  • Throughput: Events per minute by type
  • Latency: P50, P95, P99 for end-to-end processing
  • Error rate: Failures per minute by component
  • Circuit breaker state: Count of open circuits
  • Resource utilization: CPU, memory, net

You’ll use these same metrics to monitor microservices.

ML Model Optimization for Production

From a 200 MB model to a 4 MB deployment: 

1. Quantization (float32 → int8) reduces size with slight accuracy drop (200 MB → 50 MB, 95% → 94%).

  • Inference speed: 2x faster

2. Pruning (remove 60% connections)

  • Size: 50 MB → 20 MB
  • Accuracy: 94% → 93%

3. Knowledge distillation

  • Size: 20 MB → 4 MB
  • Accuracy: 93% → 92%

In the end, I reduced the model size by 50 times, lost only 3% accuracy, and made inference three times faster.

You can use these steps for any production ML deployment to save money.

  • Message broker: Mosquitto (MQTT), RabbitMQ, or Kafka 
  • Edge ML: TensorFlow Lite, ONNX Runtime 
  • Time-series DB: InfluxDB, TimescaleDB 
  • Containerization: Docker, Docker Compose 
  • Monitoring: Prometheus, Grafana

Key Takeaways for Production Systems

Event-driven systems work better than polling for real-time needs. Pub-sub architecture also makes your system less connected and improves performance.

Edge computing helps fix latency problems when every millisecond matters. It’s often better to process data close to where it’s created instead of sending it to the cloud.

For most workflows, eventual consistency is enough. Use asynchronous processes, handle retries, and make sure to check the final state.

Circuit breakers help prevent cascading failures. It’s a good idea to use them for any external dependency.

Observability is a must-have. Structured logging and good metrics make it possible to debug distributed systems. Limited environments make you a better developer.

Practical Learning Path

  1. Get Raspberry Pi 4 and install Docker.
  2. Set up Mosquitto MQTT broker.
  3. Build a simple pub-sub system (sensors publishing, consumers subscribing).
  4. Add circuit breakers and retry logic.
  5. Implement structured logging and metrics.
  6. Deploy ML model with TensorFlow Lite.

Hardware cost: ~$100. Time investment: 20-30 hours. Skills gained apply directly to microservices, stream processing, and distributed systems at work.

Architecture Event-driven architecture IoT Event

Opinions expressed by DZone contributors are their own.

Related

  • Event Driven Architecture (EDA) - Optimizer or Complicator
  • Event-Driven Architecture for Software Development: Leverage the Strength of Reactive Systems
  • Building an Event-Driven Architecture Using Kafka
  • What Are Events? Always 'Decoupled'

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook