DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Every Cache Miss Is a Tiny Tax on Your Performance
  • KV Cache Implementation Inside vLLM
  • The Bill You Didn't See Coming
  • Treat PII as Toxic: Designing Secure Systems That Contain the Blast Radius

Trending

  • Securing the AI Host: Spring AI MCP Server Communication With API Keys
  • Zero-Downtime Deployments for Java Apps on Kubernetes
  • From AI Chaos to Control: Building Enterprise-Grade LLM Gateways With MuleSoft Anypoint
  • No More Cheap Claude: 4 First Principles of Token Economics in 2026
  1. DZone
  2. Data Engineering
  3. Data
  4. Preventing Cache Stampedes at Scale

Preventing Cache Stampedes at Scale

High-concurrency systems — especially retail, travel, ticketing, or any “hot product” scenarios — often face cache stampedes (aka "thundering herd", "dogpiling"). Here's a practical pattern for managing concurrent data requests.

By 
Vikas Mittal user avatar
Vikas Mittal
·
Jan. 29, 26 · Tutorial
Likes (0)
Comment
Save
Tweet
Share
1.5K Views

Join the DZone community and get the full member experience.

Join For Free

High-concurrency systems — especially retail, travel, ticketing, or any “hot product” scenarios — often face cache stampedes (also called thundering herd or dogpiling). When a cache entry expires, every server instance may simultaneously hit the database and recompute the same value. That results in:

  • Unnecessary datastore I/O
  • Increased latency
  • CPU spikes
  • Potential outages

This article outlines a production-ready pattern that combines:

  • Per-pod probabilistic data filters (Bloom-like structures with timed expiry)
  • A distributed mutex at the cache layer (single-writer lock)

The goal is simple: Only one instance refreshes an expired key. All others wait or return stale data safely.

The Problem: Cache Stampede in Modern Systems

When many clients request the same key, this is what happens:

           ┌─────────────┐

Clients →  │  Service A  │  → miss → DB

Clients →  │  Service B  │  → miss → DB

Clients →  │  Service C  │  → miss → DB

           └─────────────┘


If the key expired, all services recompute the same thing. At scale (100k+ RPS), this destroys your backend.

The Solution: Filter + Mutex Hybrid Pattern

The pattern uses two layers of protection.

Per-Pod Probabilistic Data Filter

A small data structure (Bloom filter, time-bucketed filter, or inferential time-decay filter) running inside each service pod.

The filter:

  • Remembers keys recently seen as expired
  • Suppresses duplicate refresh attempts
  • Expires entries after t seconds
  • Is extremely space- and time-efficient
Plain Text
 
request(key):

  if filter.contains(key):

     → skip refresh

  else:

     filter.add(key)

     → attempt refresh with mutex


Why It Works

Only the first request per pod attempts a refresh. The rest are blocked for the TTL of the filter key.

Distributed Mutex at Cache Layer

Once filtering limits refresh attempts to 1 per pod, a distributed lock ensures that only one pod globally does the refresh.

Common implementations:

  • SETNX + TTL (Redis)
  • Lease-based locks
  • Distributed lock managers (Hazelcast, Aerospike, etc.)
Plain Text
 
if cache.lock_acquire(key):

    → compute + write

else:

    → return stale/fallback


Architecture Diagram

                   ┌────────────────────────┐

        Clients →  │   API Gateway / LB     │

                   └───────────┬────────────┘

                               │

         ┌─────────────────────┴───────────────────────┐

         │                   SERVICE PODS               │

         │                                              │

 ┌───────────────┐   ┌───────────────┐   ┌───────────────┐

 │  Pod A         │   │  Pod B         │   │  Pod C        │

 │  Filter A      │   │  Filter B      │   │  Filter C     │

 │  Refresh Logic │   │  Refresh Logic │   │  Refresh Logic│

 └───────┬────────┘   └──────┬─────────┘   └──────┬────────┘

         │                    │                   │

         └───── Attempt Refresh? (filtered) ──────┘

                         │

                         ▼

              ┌──────────────────────┐

              │   Distributed Cache  │

              │   (Distributed Lock) │

              └───────────┬──────────┘

                          │

                          ▼

                     ┌────────┐

                     │  DB    │

                     └────────┘

Request Flow Diagram (ASCII Sequence)

Client → Pod A : GET /item/123

Pod A → Cache : miss

Pod A → Filter A : key not present

Pod A → Filter A : add key

Pod A → Cache : acquire lock (success)

Pod A → DB : fetch source data

Pod A → Cache : write updated data

Pod A : return fresh response

Client → Pod B : GET /item/123

Pod B → Cache : miss

Pod B → Filter B : key present

Pod B : return stale response (temporary)

Pseudocode

Plain Text
def handle_request(key):

    # Step 1: Fast path

    if cache.has_valid(key):

        return cache.get(key)



    # Step 2: Local filter check

    if filter.contains(key):

        return return_stale_or_quick_fallback(key)



    # First time this pod sees the expired key

    filter.add(key)



    # Step 3: Try to acquire global distributed lock

    if cache.lock_acquire(key):

        try:

            value = datastore.read_and_compute(key)

            cache.set(key, value)

            return value

        finally:

            cache.lock_release(key)

    else:

        return return_stale_or_quick_fallback(key)










When to Use This Pattern

This pattern is ideal for the following situations:

  • Retail product detail pages
  • Inventory/price availability APIs
  • Search autosuggest
  • Real-time ranking or personalization caches
  • Ticketing and airline availability
  • Any hot-key scenario under high concurrency

But avoid using it in these cases:

  • Strong consistency is required
  • Data may not be safely stale
  • Writes must be serialized strictly with no risk of suppression
Cache (computing) Key management

Opinions expressed by DZone contributors are their own.

Related

  • Every Cache Miss Is a Tiny Tax on Your Performance
  • KV Cache Implementation Inside vLLM
  • The Bill You Didn't See Coming
  • Treat PII as Toxic: Designing Secure Systems That Contain the Blast Radius

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook