Distributed Caching Strategies: Redis, Memcached, and Application-Level Patterns

Home › Blog › Distributed Caching Strategies: Redis, Memcached, and Application-Level Patterns

Distributed Caching Strategies: Redis, Cache Invalidation, and Stampede Prevention

Caching is the most impactful performance optimization you can make — a well-designed cache can reduce database load by 90% and cut response times from 200ms to 5ms. However, distributed caching strategies introduce complexity around consistency, invalidation, and failure modes that single-server caching doesn’t face. Therefore, this guide covers the caching patterns that work in production, the ones that don’t, and how to handle the hardest problems in distributed systems.

Cache-Aside (Lazy Loading): The Default Pattern

Cache-aside is the most common caching pattern because it’s simple and flexible. The application checks the cache first. On a cache hit, return the cached value. On a miss, load from the database, store in cache, and return. The application controls all caching logic — the cache and database are independent systems.

import redis
import json
import hashlib

class CacheAside:
    def __init__(self, redis_client, db_client, default_ttl=3600):
        self.cache = redis_client
        self.db = db_client
        self.default_ttl = default_ttl

    def get_user(self, user_id):
        cache_key = f"user:{user_id}"

        # Step 1: Check cache
        cached = self.cache.get(cache_key)
        if cached:
            return json.loads(cached)  # Cache hit

        # Step 2: Cache miss — load from database
        user = self.db.query("SELECT * FROM users WHERE id = %s", user_id)
        if user is None:
            # Cache negative result to prevent repeated DB queries
            self.cache.setex(f"user:{user_id}:null", 300, "1")
            return None

        # Step 3: Store in cache with TTL
        self.cache.setex(cache_key, self.default_ttl, json.dumps(user))
        return user

    def update_user(self, user_id, data):
        # Update database first, then invalidate cache
        self.db.execute("UPDATE users SET ... WHERE id = %s", user_id)
        self.cache.delete(f"user:{user_id}")
        # Don't set the new value — let the next read populate it
        # This avoids race conditions between concurrent updates

The key insight with cache-aside is: on write, delete the cache entry rather than updating it. If you update the cache, concurrent writes can leave stale data. Deleting forces the next read to reload from the database, which is always authoritative. Additionally, always set a TTL as a safety net — even if your invalidation logic has bugs, stale data eventually expires.

Notice the negative caching in the miss path. Without it, a flood of requests for a non-existent ID — a common signature of a scraper or a broken client — bypasses the cache entirely and hammers the database on every call. Caching the “not found” result for a short window absorbs that abuse, but keep the TTL deliberately short so that legitimately created records become visible quickly. The trade-off is the classic tension between protecting the database and serving fresh data, and a 30-to-60-second negative TTL usually balances both.

Distributed caching architecture diagram — Cache-aside pattern: application manages the cache, database remains the source of truth

Write-Through and Write-Behind Patterns

Write-through caching writes to the cache and database simultaneously on every update. This ensures the cache is always fresh but adds write latency since every write hits both systems. It works well when you read frequently and write infrequently — user profiles, product catalogs, and configuration data.

Write-behind (write-back) caching writes to the cache immediately and asynchronously flushes to the database later. This provides the lowest write latency but risks data loss if the cache fails before flushing. Moreover, implementing write-behind correctly requires careful handling of ordering, batching, and failure recovery.

class WriteBehindCache:
    """Write-behind cache with batched async persistence"""

    def __init__(self, redis_client, db_client, flush_interval=5):
        self.cache = redis_client
        self.db = db_client
        self.flush_interval = flush_interval
        self.pending_writes = "pending_writes"  # Redis sorted set

    def write(self, key, value):
        pipe = self.cache.pipeline()
        # Write to cache immediately
        pipe.set(key, json.dumps(value))
        # Add to pending writes queue (score = timestamp for ordering)
        pipe.zadd(self.pending_writes, {key: time.time()})
        pipe.execute()

    async def flush_to_database(self):
        """Periodically flush pending writes to database"""
        while True:
            # Get oldest pending writes
            pending = self.cache.zrangebyscore(
                self.pending_writes, "-inf", "+inf", start=0, num=100
            )

            if pending:
                batch = []
                for key in pending:
                    value = self.cache.get(key)
                    if value:
                        batch.append((key, json.loads(value)))

                # Batch write to database
                self.db.bulk_upsert(batch)

                # Remove from pending queue
                self.cache.zrem(self.pending_writes, *pending)

            await asyncio.sleep(self.flush_interval)

Write-behind is seductive for write-heavy workloads, but be honest about its durability story. Anything sitting in the pending queue when a node dies is gone unless Redis itself is persisted and replicated, so write-behind is only safe for data you can afford to lose or reconstruct — view counts, last-seen timestamps, or analytics rollups. For money, orders, or anything a regulator cares about, the database must be the synchronous source of truth. A common middle ground is to keep write-through for the authoritative fields and write-behind only for the high-frequency, low-stakes counters that would otherwise dominate your write throughput.

Cache Invalidation: The Hard Problem

There are only two hard things in Computer Science: cache invalidation and naming things. Cache invalidation is hard because distributed systems don’t have a single, consistent view of time. Here are the patterns that work in practice:

TTL-based expiration: The simplest approach. Set a TTL on every cache entry and accept that data may be stale for up to TTL seconds. For many applications, serving data that’s 60 seconds old is perfectly acceptable.

Event-driven invalidation: When the database changes, publish an event that triggers cache deletion. Use database triggers, change data capture (CDC), or application-level events. This is more complex but provides near-real-time cache freshness.

Version-based invalidation: Include a version number in the cache key. When data changes, increment the version. Old cache entries naturally expire while new reads use the new version key. This avoids race conditions between invalidation and population.

# Version-based cache invalidation
class VersionedCache:
    def get(self, entity_type, entity_id):
        version = self.cache.get(f"{entity_type}:version:{entity_id}") or "1"
        cache_key = f"{entity_type}:{entity_id}:v{version}"
        return self.cache.get(cache_key)

    def invalidate(self, entity_type, entity_id):
        # Increment version — old cached value is orphaned (expires via TTL)
        self.cache.incr(f"{entity_type}:version:{entity_id}")
        # No need to delete old cache entry — it won't be read again

The subtle failure these patterns guard against is the lost-delete race. With naive delete-on-write, a reader can fetch a stale row from the database, get descheduled, and then write that stale value into the cache after a concurrent writer already invalidated it — leaving the cache permanently wrong until the TTL fires. Version-based keys sidestep this entirely because the late writer populates an old version key that nobody will ever read again. Event-driven invalidation has its own gap: events can arrive out of order or be lost during a broker outage, so always pair it with a TTL backstop rather than trusting the event stream to be perfectly reliable.

Cache invalidation strategies visualization — Version-based invalidation avoids race conditions by never deleting cache entries

Cache Stampede Prevention

A cache stampede occurs when a popular cache entry expires and hundreds of concurrent requests all miss the cache simultaneously, flooding the database with identical queries. This can cascade into a full database outage. Three techniques prevent stampedes:

import threading
import time

class StampedeProtectedCache:
    def __init__(self, cache, db, lock_timeout=5):
        self.cache = cache
        self.db = db
        self.lock_timeout = lock_timeout

    def get_with_lock(self, key, loader_fn, ttl=3600):
        """Pattern 1: Locking — only one request rebuilds the cache"""
        value = self.cache.get(key)
        if value:
            return json.loads(value)

        lock_key = f"lock:{key}"
        # Try to acquire lock (SET NX with expiry)
        acquired = self.cache.set(lock_key, "1", nx=True, ex=self.lock_timeout)

        if acquired:
            # This request rebuilds the cache
            try:
                value = loader_fn()
                self.cache.setex(key, ttl, json.dumps(value))
                return value
            finally:
                self.cache.delete(lock_key)
        else:
            # Another request is rebuilding — wait and retry
            time.sleep(0.1)
            return self.get_with_lock(key, loader_fn, ttl)

    def get_with_early_refresh(self, key, loader_fn, ttl=3600, refresh_at=0.8):
        """Pattern 2: Probabilistic early refresh"""
        value = self.cache.get(key)
        remaining_ttl = self.cache.ttl(key)

        if value and remaining_ttl > ttl * (1 - refresh_at):
            return json.loads(value)

        if value:
            # Cache still valid but close to expiry — refresh in background
            threading.Thread(
                target=lambda: self._refresh(key, loader_fn, ttl)
            ).start()
            return json.loads(value)  # Return stale data while refreshing

        # Cache miss — load synchronously
        return self._refresh(key, loader_fn, ttl)

The locking pattern is the most reliable but adds latency for waiting requests. Probabilistic early refresh works well for high-traffic keys — some percentage of requests refresh the cache before it expires, spreading the load. Furthermore, serving slightly stale data during refresh is usually acceptable and prevents the stampede entirely.

The XFetch Algorithm and TTL Jitter

The polished form of early refresh is the XFetch algorithm, which makes the recomputation probability rise smoothly as expiry approaches rather than triggering at a fixed threshold. Each read draws a random number and recomputes early if a small gap term, scaled by how long the value took to build, exceeds the remaining TTL. Expensive-to-rebuild keys therefore refresh earlier and more eagerly, while cheap keys ride closer to expiry — exactly the bias you want.

import math, random, time

def xfetch(cache, key, loader_fn, ttl=3600, beta=1.0):
    raw = cache.hgetall(key)            # {value, delta, expiry}
    if raw:
        delta  = float(raw[b"delta"])   # seconds the last rebuild took
        expiry = float(raw[b"expiry"])  # absolute unix expiry
        now    = time.time()
        # Probabilistic early expiration: fire before the hard TTL
        if now - delta * beta * math.log(random.random()) < expiry:
            return json.loads(raw[b"value"])  # still "fresh enough"

    start = time.time()
    value = loader_fn()
    delta = time.time() - start
    cache.hset(key, mapping={
        "value":  json.dumps(value),
        "delta":  delta,
        "expiry": time.time() + ttl,
    })
    cache.expire(key, ttl)
    return value

A complementary and far simpler trick is TTL jitter: instead of giving every entry the same 3600-second TTL, randomize it within a band such as 3300 to 3900 seconds. Synchronized expiry is a stampede waiting to happen — if you warm a thousand keys in one batch job, they all expire in the same instant and stampede together. Spreading expirations across a window converts one large thundering herd into a steady trickle of misses the database can absorb. Jitter costs you a single line of code and prevents an entire class of correlated-expiry incidents.

Distributed Caching Strategies for Multi-Region Deployments

Once your application spans regions, a single shared cache becomes a latency and availability liability — a read from Frankfurt should not cross the Atlantic to reach a cache in Virginia. The common answer is a cache per region backed by a regional database replica, which keeps reads local but reintroduces the consistency question across regions. Invalidation events must now fan out to every region, and because that fan-out is asynchronous, you should design for brief windows where regions disagree.

A pragmatic rule is to scope strong consistency to within a region and accept eventual consistency between regions, choosing TTLs short enough that cross-region drift self-heals within an acceptable window. For genuinely global, write-heavy data where this drift is unacceptable, the better fix is often at the database layer rather than the cache — a distributed SQL engine that handles replication for you, as discussed in the broader literature on globally distributed datastores. Reserve the cache for what it does best: absorbing read load close to the user.

Redis Cluster: Scaling Beyond One Node

A single Redis node handles 100,000+ operations per second, but when you need more throughput or more memory, Redis Cluster distributes data across multiple nodes using hash slots. Each key is assigned to one of 16,384 hash slots, and slots are distributed across nodes. Consequently, a 3-node cluster provides roughly 3x the throughput and 3x the memory capacity.

Key considerations for Redis Cluster: multi-key operations (MGET, pipeline) only work when all keys are on the same node. Use hash tags (e.g., {user:123}:profile, {user:123}:sessions) to co-locate related keys on the same slot. Cross-slot operations like SUNION across different hash slots will fail.

Hash tags are a double-edged sword worth flagging. They give you atomic multi-key operations, but if you tag too aggressively — say, by tenant — every key for a large tenant lands on one node and creates a hot slot that no amount of horizontal scaling relieves. The cluster looks balanced by node count yet one shard saturates while the rest idle. Aim hash tags at the smallest grouping that your atomic operations actually require, and monitor per-slot key counts so a single hot tenant doesn't quietly become your bottleneck. For deeper Redis-versus-fork trade-offs, the companion piece on Redis 8 vs Valkey covers the licensing and clustering differences in detail.

When NOT to Add a Distributed Cache — Trade-offs

A cache is not a default; it is a deliberate trade of consistency and operational surface for latency. If your dataset fits comfortably in memory on a single application node, an in-process cache like Caffeine or a simple LRU map will beat a network round-trip to Redis every time, with none of the cluster operations to manage. Reach for a distributed cache only when you need a shared view across many processes or when the working set genuinely exceeds a single host's memory.

Be equally wary of caching data that is cheap to compute or already fast to fetch — adding a cache there buys you invalidation bugs and a new failure mode in exchange for negligible savings. And remember the cardinal rule: the cache must be optional. If a Redis outage takes your application down, you have not built a cache, you have built a fragile second database. Test the cold-cache path under load before you ship, because the day your cache flushes is exactly the day you will discover whether the origin can survive without it.

Redis cluster data distribution — Redis Cluster distributes 16,384 hash slots across nodes for horizontal scaling

Related Reading:

Resources:

In conclusion, well-designed distributed caching strategies are essential for performance but demand careful attention to invalidation, consistency, and failure modes. Start with cache-aside and TTL-based expiration — it handles 90% of use cases. Add stampede protection for high-traffic keys and event-driven invalidation when you need stronger consistency. The cache should always be treated as ephemeral — your application must work (slowly) if the cache disappears entirely.