DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Designing High-Volume Systems Using Event-Driven Architectures
  • Enhancing Stream Data Processing With Snow Pipe, Cortex AI, and Snow Park
  • Exploring the Dynamics of Streaming Databases
  • Choosing the Right Stream Processing System: A Comprehensive Guide

Trending

  • 7 Technology Waves I’ve Seen in 30 Years of Software — Will AI Be the Next Real Transformation?
  • Implementing Secure API Gateways for Microservices Architecture
  • Implementing Observability in Distributed Systems Using OpenTelemetry
  • Getting Started With Agentic Workflows in Java and Quarkus
  1. DZone
  2. Data Engineering
  3. Data
  4. Implementing Write-Through Cache for Real-Time Data Processing: A Scalable Approach

Implementing Write-Through Cache for Real-Time Data Processing: A Scalable Approach

Learn how a write-through local cache boosts performance and consistency in high-volume data streams while reducing database load.

By 
Rohith Narasimhamurthy user avatar
Rohith Narasimhamurthy
·
Aug. 29, 25 · Analysis
Likes (3)
Comment
Save
Tweet
Share
3.3K Views

Join the DZone community and get the full member experience.

Join For Free

Real-time data processing systems often struggle with balancing performance and data consistency when handling high volumes of transactions. This article explores how a write-through local cache can optimize performance.

Introduction to Write-Through Caches

A write-through cache is a caching strategy where data is written to both the cache and the backing store simultaneously. This approach ensures that the cache always contains the most recent data while maintaining consistency with the underlying data store.

In distributed systems processing event streams, write-through caches can significantly reduce latency and database load by localizing data access patterns. Unlike write-back caching (which delays updates to the backing store) or read-through caching (which only populates on read operations), write-through caching provides stronger consistency guarantees while still offering performance benefits.

The Entity-Signal Pattern

Before diving into implementation details, let's understand the data model we're working with:

  • EntityIDs: Unique identifiers for items in the system (e.g., users, devices)
  • Signals: Attributes or events associated with these entities
  • Signal History: A collection of signals for each entity that may need to be accessed frequently

This pattern is common in analytics systems, fraud detection platforms, recommendation engines, and IoT applications, where entity behavior needs to be tracked over time.

Architecture Overview

Our write-through cache implementation focuses on optimizing stream processing workloads where:

  1. Each processing node handles specific entityIDs (partitioned by hash)
  2. Signal history for these entities is frequently accessed
  3. New signals continually arrive via stream processing (e.g., Kinesis)

The (local) cache serves three primary purposes:

  1. Periodically load signals for entityIDs from Redis
  2. Store new signals arriving
  3. Always provide complete and current signal data for any cached entity

Architecture overview

Implementation Details

The implementation consists of two coordinated components:

Implementation details

1. Loading Cache

The loading cache is responsible for periodically refreshing entity data from the backend Redis store:

Java
 
private final LoadingCache<String, Set<Signal>> loadingCache;

// In constructor
this.loadingCache = CacheBuilder.newBuilder()
                               .maximumSize(LOADING_CACHE_MAX_SIZE)
                               .expireAfterWrite(LOADING_CACHE_TIMEOUT, TimeUnit.MINUTES)
                               .removalListener(this)
                               .recordStats()
                               .build(new SignalCacheLoader(
                                       redisDAO,
                                       secretTokenizer,
                                       weblabClient,
                                       metricsManager));


Key configurations include:

  • Maximum size: Prevents unbounded memory growth (LOADING_CACHE_MAX_SIZE = 1000)
  • Expire-after-write: Ensures data is refreshed periodically (LOADING_CACHE_TIMEOUT = 5 minutes)
  • Removal listener: Notifies when entries are evicted

2. Signal Cache

The signal cache stores incoming signals between refresh cycles:

Java
 
// Starts with initial capacity of 100 and grows till the size limit of loading cache
// As loading cache expires its entries, this class acts as a removal listener and removes the entry for Signal
// cache as well.


When new signals need to be stored:

Java
 
/**
 * Index signals to the entityId.
 * @param key entityId.
 * @param value Set of signals.
 */
@Override
public void put(String key, Set<Signal> value) {
    if (key != null && value != null) {
        // Save the signal in local cache
        localCache.put(key, Lists.newArrayList(value));
    }
}


When requesting signals for an entity:

Java
 
@Override
public Set<Signal> getIfPresent(Object key) {
    if (key != null && (key instanceof String)) {
        final Set<Signal> signals = loadingCache.get((String) key);
        if (log.isDebugEnabled()) {
            log.debug("Number of signals retrieved from redis cache loader for [entity: {}] is {}",
                     getResult((String) key), signals.size());
        }

        // Fetch any recent signals for entity which came in after redis load()
        final List<Signal> inMemorySignals = (List<Signal>) localCache.getIfPresent(key);
        if (log.isDebugEnabled()) {
            log.debug("Number of signals retrieved from local in-memory cache for [entity: {}] is {}",
                     getResult((String) key), inMemorySignals.size());
        }
        signals.addAll(inMemorySignals);

        return new HashSet<>(signals);
    }
    return Collections.emptySet();
}


Memory Management Strategy

The cache implementation employs a coordinated memory management strategy:

  1. Loading cache: Configured with maximum size and expiry time. When entries expire or are evicted due to size constraints, removal notifications are triggered.
  2. Signal cache: It starts with a defined initial capacity and grows as needed. Its size is effectively controlled by the loading cache through the removal notification mechanism.
Java
 
/**
 * When loadingCache expires its entry, this listener removes the key from
 * localCache as well.
 * @param removalNotification
 */
@Override
public void onRemoval(RemovalNotification removalNotification) {
    if (log.isDebugEnabled()) {
        log.debug("Invalidating local cache for [entity: {}]",
                secretTokenizer.getHashedResult((String) removalNotification.getKey()));
    }
    localCache.invalidate(removalNotification.getKey());
}


This coordinated approach ensures both caches stay in sync and allows the JVM to reclaim memory when entries are no longer needed.

Business Benefits

1. Reduced Transaction Costs

By localizing data access, this caching strategy dramatically reduces Redis operations:

  • Without cache: Each incoming signal requires a Redis read and write.
  • With cache: Redis reads occur once per expiry interval per entity.
  • Cost reduction: For high-volume systems processing millions of events, this can translate to a 95%+ reduction in Redis operations.

2. Improved Latency

Stream processing latency is significantly reduced:

  • Without cache: 10-20ms per signal (Redis round-trip)
  • With cache: Sub-millisecond response for cached entities
  • Benefit: Lower processing latency leads to more responsive systems

3. Enhanced Resilience

The write-through cache provides resilience against temporary Redis failures:

  • Processing can continue with cached data during short outages
  • Writes can be queued and flushed when connectivity is restored
  • System availability increases while dependency on Redis availability decreases

4. Optimized Cost Structure

Beyond direct transaction cost reduction:

  • Infrastructure savings: Reduced Redis cluster size needed for the same workload
  • Scaling efficiency: More efficient processing means fewer nodes needed for the same throughput
  • Operational simplicity: Less complex Redis scaling and management

Considerations and Best Practices

While implementing this pattern, keep these guidelines in mind:

  1. Cache size tuning: Set the maximum cache size based on entity cardinality and available memory.
  2. Expiry intervals: Balance freshness needs against Redis load.
  3. Monitoring: Implement cache hit/miss metrics to verify efficiency.
  4. Consistency controls: Consider adding version stamps for conflict resolution.
  5. Warm-up strategy: Pre-populate caches for critical entities during startup.

Conclusion

A well-implemented write-through cache can transform the performance and cost profile of entity-based stream processing systems. By intelligently caching entity signals at the processing node level and coordinating with a backing Redis store, organizations can achieve both high performance and data consistency while significantly reducing infrastructure costs.

This pattern is particularly valuable for high-throughput systems where entities exhibit locality (the same entities appear repeatedly in the stream) and where signal history needs to be maintained and frequently accessed during processing.

Whether you're building a fraud detection system, user behavior analytics, or IoT data processor, consider implementing this write-through caching pattern to boost performance while keeping your database costs under control.

Data processing Stream processing Cache (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Designing High-Volume Systems Using Event-Driven Architectures
  • Enhancing Stream Data Processing With Snow Pipe, Cortex AI, and Snow Park
  • Exploring the Dynamics of Streaming Databases
  • Choosing the Right Stream Processing System: A Comprehensive Guide

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook