Preventing Cache Stampedes at Scale

High-concurrency systems — especially retail, travel, ticketing, or any “hot product” scenarios — often face cache stampedes (aka "thundering herd", "dogpiling"). Here's a practical pattern for managing concurrent data requests.

Vikas Mittal

Jan. 29, 26 · Tutorial

Likes (0)

Comment

Save

1.6K Views

High-concurrency systems — especially retail, travel, ticketing, or any “hot product” scenarios — often face cache stampedes (also called thundering herd or dogpiling). When a cache entry expires, every server instance may simultaneously hit the database and recompute the same value. That results in:

Unnecessary datastore I/O
Increased latency
CPU spikes
Potential outages

This article outlines a production-ready pattern that combines:

Per-pod probabilistic data filters (Bloom-like structures with timed expiry)
A distributed mutex at the cache layer (single-writer lock)

The goal is simple: Only one instance refreshes an expired key. All others wait or return stale data safely.

The Problem: Cache Stampede in Modern Systems

When many clients request the same key, this is what happens:

┌─────────────┐

Clients → │ Service A │ → miss → DB

Clients → │ Service B │ → miss → DB

Clients → │ Service C │ → miss → DB

└─────────────┘

If the key expired, all services recompute the same thing. At scale (100k+ RPS), this destroys your backend.

The Solution: Filter + Mutex Hybrid Pattern

The pattern uses two layers of protection.

Per-Pod Probabilistic Data Filter

A small data structure (Bloom filter, time-bucketed filter, or inferential time-decay filter) running inside each service pod.

The filter:

Remembers keys recently seen as expired
Suppresses duplicate refresh attempts
Expires entries after t seconds
Is extremely space- and time-efficient

    Plain Text
   
   request(key):

  if filter.contains(key):

     → skip refresh

  else:

     filter.add(key)

     → attempt refresh with mutex

Why It Works

Only the first request per pod attempts a refresh. The rest are blocked for the TTL of the filter key.

Distributed Mutex at Cache Layer

Once filtering limits refresh attempts to 1 per pod, a distributed lock ensures that only one pod globally does the refresh.

Common implementations:

SETNX + TTL (Redis)
Lease-based locks
Distributed lock managers (Hazelcast, Aerospike, etc.)

    Plain Text
   
   if cache.lock_acquire(key):

    → compute + write

else:

    → return stale/fallback

Architecture Diagram

┌────────────────────────┐

Clients → │ API Gateway / LB │

└───────────┬────────────┘

│

┌─────────────────────┴───────────────────────┐

│ SERVICE PODS │

│ │

┌───────────────┐ ┌───────────────┐ ┌───────────────┐

│ Pod A │ │ Pod B │ │ Pod C │

│ Filter A │ │ Filter B │ │ Filter C │

│ Refresh Logic │ │ Refresh Logic │ │ Refresh Logic│

└───────┬────────┘ └──────┬─────────┘ └──────┬────────┘

│ │ │

└───── Attempt Refresh? (filtered) ──────┘

│

▼

┌──────────────────────┐

│ Distributed Cache │

│ (Distributed Lock) │

└───────────┬──────────┘

│

▼

┌────────┐

│ DB │

└────────┘

Request Flow Diagram (ASCII Sequence)

Client → Pod A : GET /item/123
Pod A → Cache : miss
Pod A → Filter A : key not present
Pod A → Filter A : add key
Pod A → Cache : acquire lock (success)
Pod A → DB : fetch source data
Pod A → Cache : write updated data
Pod A : return fresh response

Client → Pod B : GET /item/123
Pod B → Cache : miss
Pod B → Filter B : key present
Pod B : return stale response (temporary)

Pseudocode

Plain Text
 
def handle_request(key):

    # Step 1: Fast path

    if cache.has_valid(key):

        return cache.get(key)



    # Step 2: Local filter check

    if filter.contains(key):

        return return_stale_or_quick_fallback(key)



    # First time this pod sees the expired key

    filter.add(key)



    # Step 3: Try to acquire global distributed lock

    if cache.lock_acquire(key):

        try:

            value = datastore.read_and_compute(key)

            cache.set(key, value)

            return value

        finally:

            cache.lock_release(key)

    else:

        return return_stale_or_quick_fallback(key)

When to Use This Pattern

This pattern is ideal for the following situations:

Retail product detail pages
Inventory/price availability APIs
Search autosuggest
Real-time ranking or personalization caches
Ticketing and airline availability
Any hot-key scenario under high concurrency

But avoid using it in these cases:

Strong consistency is required
Data may not be safely stale
Writes must be serialized strictly with no risk of suppression

Cache (computing) Key management

Opinions expressed by DZone contributors are their own.

Related

Trending