Preventing Cache Stampedes at Scale
High-concurrency systems — especially retail, travel, ticketing, or any “hot product” scenarios — often face cache stampedes (aka "thundering herd", "dogpiling"). Here's a practical pattern for managing concurrent data requests.
Join the DZone community and get the full member experience.
Join For FreeHigh-concurrency systems — especially retail, travel, ticketing, or any “hot product” scenarios — often face cache stampedes (also called thundering herd or dogpiling). When a cache entry expires, every server instance may simultaneously hit the database and recompute the same value. That results in:
- Unnecessary datastore I/O
- Increased latency
- CPU spikes
- Potential outages
This article outlines a production-ready pattern that combines:
- Per-pod probabilistic data filters (Bloom-like structures with timed expiry)
- A distributed mutex at the cache layer (single-writer lock)
The goal is simple: Only one instance refreshes an expired key. All others wait or return stale data safely.
The Problem: Cache Stampede in Modern Systems
When many clients request the same key, this is what happens:
┌─────────────┐
Clients → │ Service A │ → miss → DB
Clients → │ Service B │ → miss → DB
Clients → │ Service C │ → miss → DB
└─────────────┘
If the key expired, all services recompute the same thing. At scale (100k+ RPS), this destroys your backend.
The Solution: Filter + Mutex Hybrid Pattern
The pattern uses two layers of protection.
Per-Pod Probabilistic Data Filter
A small data structure (Bloom filter, time-bucketed filter, or inferential time-decay filter) running inside each service pod.
The filter:
- Remembers keys recently seen as expired
- Suppresses duplicate refresh attempts
- Expires entries after
tseconds - Is extremely space- and time-efficient
request(key):
if filter.contains(key):
→ skip refresh
else:
filter.add(key)
→ attempt refresh with mutex
Why It Works
Only the first request per pod attempts a refresh. The rest are blocked for the TTL of the filter key.
Distributed Mutex at Cache Layer
Once filtering limits refresh attempts to 1 per pod, a distributed lock ensures that only one pod globally does the refresh.
Common implementations:
SETNX+ TTL (Redis)- Lease-based locks
- Distributed lock managers (Hazelcast, Aerospike, etc.)
if cache.lock_acquire(key):
→ compute + write
else:
→ return stale/fallback
Architecture Diagram
┌────────────────────────┐
Clients → │ API Gateway / LB │
└───────────┬────────────┘
│
┌─────────────────────┴───────────────────────┐
│ SERVICE PODS │
│ │
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Pod A │ │ Pod B │ │ Pod C │
│ Filter A │ │ Filter B │ │ Filter C │
│ Refresh Logic │ │ Refresh Logic │ │ Refresh Logic│
└───────┬────────┘ └──────┬─────────┘ └──────┬────────┘
│ │ │
└───── Attempt Refresh? (filtered) ──────┘
│
▼
┌──────────────────────┐
│ Distributed Cache │
│ (Distributed Lock) │
└───────────┬──────────┘
│
▼
┌────────┐
│ DB │
└────────┘
Request Flow Diagram (ASCII Sequence)
Client → Pod A : GET /item/123
Pod A → Cache : miss
Pod A → Filter A : key not present
Pod A → Filter A : add key
Pod A → Cache : acquire lock (success)
Pod A → DB : fetch source data
Pod A → Cache : write updated data
Pod A : return fresh response
Client → Pod B : GET /item/123
Pod B → Cache : miss
Pod B → Filter B : key present
Pod B : return stale response (temporary)
Pseudocode
Plain Textdef handle_request(key): # Step 1: Fast path if cache.has_valid(key): return cache.get(key) # Step 2: Local filter check if filter.contains(key): return return_stale_or_quick_fallback(key) # First time this pod sees the expired key filter.add(key) # Step 3: Try to acquire global distributed lock if cache.lock_acquire(key): try: value = datastore.read_and_compute(key) cache.set(key, value) return value finally: cache.lock_release(key) else: return return_stale_or_quick_fallback(key)
When to Use This Pattern
This pattern is ideal for the following situations:
- Retail product detail pages
- Inventory/price availability APIs
- Search autosuggest
- Real-time ranking or personalization caches
- Ticketing and airline availability
- Any hot-key scenario under high concurrency
But avoid using it in these cases:
- Strong consistency is required
- Data may not be safely stale
- Writes must be serialized strictly with no risk of suppression
Opinions expressed by DZone contributors are their own.
Comments