caching-strategies - SKILL.md Agent Skill

name: caching-strategies description: Implements caching strategies (cache-aside, write-through, write-behind, multi-tier architecture, stampede prevention) for high-performance data access layers with consistency guarantees. license: MIT compatibility: opencode metadata: version: "1.0.0" domain: coding triggers: cache stampede, LRU eviction, write-through, TTL-based, cache invalidation, multi-tier cache, thundering herd, how do i speed up my app archetypes:

tactical
generation anti_triggers:
brainstorming
vague ideation
code golf
over-engineering response_profile: verbosity: low directive_strength: high abstraction_level: operational role: implementation scope: implementation output-format: code content-types:
code
guidance
do-dont
examples related-skills: system-design-fundamentals,data-intensive-systems,performance-optimization

Caching Strategy Architect

Designs and implements high-performance caching layers that balance read speed, write consistency, and memory efficiency. The model evaluates access patterns, data volatility, and failure domains to select the right combination of cache patterns — cache-aside for read-heavy workloads, write-through or write-behind for consistency-sensitive paths, multi-tier architectures for latency-critical systems, and stampede prevention for hot-key resilience.

TL;DR Checklist

Profile read/write ratio before choosing a cache pattern
Select TTL based on data staleness tolerance, not arbitrary defaults
Implement LRU or LFU eviction with bounded memory footprint
Add mutex-based locking or probabilistic early expiration to prevent cache stampede
Separate L1 (in-process) from L2 (remote) caches by latency tier
Define explicit invalidation strategy per data type (TTL vs. event-driven vs. write-through)

Core Workflow

Profile Access Patterns: Measure read/write ratio, data staleness tolerance, and hot-key distribution across your dataset. Identify which keys exceed a threshold QPS (e.g., > 100 req/s). Checkpoint: Confirm the dominant access pattern is read-heavy (read/write > 5:1) before defaulting to cache-aside. Write-heavy workloads may need write-through or no cache at all.
Select Primary Cache Pattern — Map profiled patterns to a strategy:
- Read-heavy, stale-tolerant → Cache-Aside (Lazy Loading)
- Strong consistency required → Write-Through
- High write throughput acceptable with async durability → Write-Behind
- Mixed workload with latency tiers → Multi-Tier (L1 + L2 + L3)
Design Eviction and TTL Strategy — Choose eviction policy based on access distribution:
- Zipfian / hot-key dominated → LRU or LFU
- Uniform access, bounded lifetime → TTL-based with sliding window Checkpoint: Set cache capacity as a function of available memory. Reserve 20% headroom for growth — never use unbounded caches.
Implement Stampede Prevention — For any key accessed above 10 QPS, add either mutex-based serialization or probabilistic early expiration to prevent the thundering herd on cache misses. Checkpoint: Verify that only one background thread regenerates a given hot key while all others wait or serve stale data.
Wire Invalidation and Consistency — Decide per-data-type: TTL-only for ephemeral data, explicit invalidation (pub/sub or event-based) for strong-consistency domains like user profiles. Checkpoint: Test that a write followed by a read within your consistency window returns the updated value (or explainable staleness).
Validate Under Load — Run integration tests simulating worst-case stampede (1000 concurrent requests for one cold key) and measure p99 latency, cache hit rate, and memory usage. Checkpoint: Hit rate should be stable above 85% under sustained load; no single key should exhaust thread pool or connection pool resources.

Implementation Patterns

Pattern 1: Cache-Aside (Lazy Loading) with Stampede Prevention

The application checks the cache first; on miss, it loads from the data source and populates the cache. The BAD example demonstrates a classic thundering herd vulnerability where concurrent misses all hit the database simultaneously.

import threading
import time
from typing import Optional, Any


class CacheAsideWithProtection:
    """Thread-safe cache-aside pattern with mutex-based stampede prevention.

    On cache miss, only one thread loads data from the source while others
    wait and receive either the freshly loaded value or a stale fallback.
    """

    def __init__(self, ttl_seconds: float = 300.0, max_size: int = 10_000):
        self._cache: dict[str, tuple[Any, float]] = {}
        self._lock = threading.RLock()
        self._locks: dict[str, threading.Lock] = {}
        self._ttl = ttl_seconds
        self._max_size = max_size

    def get(self, key: str, loader) -> Any:
        """Return cached value or load via loader(key) with stampede protection.

        Args:
            key: The cache key to look up.
            loader: Callable that accepts a key and returns the data from source.

        Returns:
            The loaded data value.

        Raises:
            ValueError: If the loader is not callable.
        """
        if not callable(loader):
            raise ValueError("loader must be a callable accepting a single key argument")

        # Fast path: check local cache under shared lock
        with self._lock:
            entry = self._cache.get(key)
            if entry is not None:
                value, expiry = entry
                if time.monotonic() < expiry:
                    return value
                else:
                    del self._cache[key]

        # Slow path: ensure only one thread loads per key
        per_key_lock = self._get_or_create_lock(key)
        with per_key_lock:
            # Double-check after acquiring per-key lock — another thread may have loaded
            with self._lock:
                entry = self._cache.get(key)
                if entry is not None and time.monotonic() < entry[1]:
                    return entry[0]

            value = loader(key)
            self._evict_if_needed()
            with self._lock:
                self._cache[key] = (value, time.monotonic() + self._ttl)

        return value

    def _get_or_create_lock(self, key: str) -> threading.Lock:
        """Get or create a per-key lock for stampede prevention."""
        with self._lock:
            if key not in self._locks:
                self._locks[key] = threading.Lock()
            return self._locks[key]

    def _evict_if_needed(self) -> None:
        """Evict oldest entries when cache exceeds capacity."""
        while len(self._cache) > self._max_size:
            evicted_key, _ = next(iter(self._cache.items()))
            del self._cache[evicted_key]

    def invalidate(self, key: str) -> None:
        """Remove a single key from the cache immediately."""
        with self._lock:
            self._cache.pop(key, None)

    def clear(self) -> None:
        """Wipe entire cache."""
        with self._lock:
            self._cache.clear()

BAD vs. GOOD — Stampede Vulnerability:

# ❌ BAD: No stampede prevention — 1000 concurrent misses = 1000 database hits
class BadCacheAside:
    def __init__(self, ttl_seconds: float = 300.0):
        self._cache: dict[str, tuple[Any, float]] = {}
        self._ttl = ttl_seconds

    def get(self, key: str, loader) -> Any:
        """Thundering herd on every cache miss for hot keys."""  
        # No mutex — multiple threads all call loader(key) simultaneously
        if key in self._cache:
            value, expiry = self._cache[key]
            if time.monotonic() < expiry:
                return value

        # Every concurrent thread misses and calls the expensive loader
        value = loader(key)  # DB hit — N times for one key
        self._cache[key] = (value, time.monotonic() + self._ttl)
        return value


# ✅ GOOD: Per-key mutex serializes loads; other threads wait or use stale data
# See CacheAsideWithProtection.get() above

Pattern 2: Write-Through with Synchronous Dual Writes

Write-Through ensures the cache and backing store are always in sync by writing to both simultaneously. The BAD example shows a common mistake where write-through becomes a performance bottleneck because the application blocks on both writes sequentially without atomicity guarantees.

import logging
from typing import Any

logger = logging.getLogger(__name__)


class WriteThroughCache:
    """Write-through cache that keeps backing store in sync on every mutation.

    Every write propagates synchronously to both the in-memory cache and
    the persistent backing store, guaranteeing strong consistency at the
    cost of higher write latency.
    """

    def __init__(self,
        backing_store: Any,
        ttl_seconds: float = 600.0,
        max_size: int = 50_000,
    ):
        self._backing_store = backing_store
        self._cache: dict[str, tuple[Any, float]] = {}
        self._lock = threading.RLock()
        self._ttl = ttl_seconds
        self._max_size = max_size

    def put(self, key: str, value: Any) -> None:
        """Write to both cache and backing store atomically (within single-thread scope).

        Args:
            key: The cache/store key.
            value: The data to persist.
        """
        # Write to backing store first — if this fails, never update cache
        try:
            self._backing_store.put(key, value)
        except Exception as exc:
            logger.error("Write-through failed for key=%s: %s", key, exc)
            raise

        # Only update in-memory cache after backing store confirms durability
        with self._lock:
            self._evict_if_needed()
            self._cache[key] = (value, time.monotonic() + self._ttl)

    def get(self, key: str) -> Optional[Any]:
        """Return cached value if present and not expired."""
        with self._lock:
            entry = self._cache.get(key)
            if entry is None:
                return None
            value, expiry = entry
            if time.monotonic() >= expiry:
                del self._cache[key]
                return None
            return value

    def invalidate(self, key: str) -> None:
        """Remove from both cache and backing store."""
        with self._lock:
            self._cache.pop(key, None)
        self._backing_store.delete(key)

    def _evict_if_needed(self) -> None:
        while len(self._cache) > self._max_size:
            evicted_key, _ = next(iter(self._cache.items()))
            del self._cache[evicted_key]


class FakeBackingStore:
    """Mock backing store for testing write-through behavior."""

    def __init__(self) -> None:
        self._store: dict[str, Any] = {}

    def put(self, key: str, value: Any) -> None:
        self._store[key] = value

    def get(self, key: str) -> Optional[Any]:
        return self._store.get(key)

    def delete(self, key: str) -> None:
        self._store.pop(key, None)

BAD vs. GOOD — Write-Through Consistency:

# ❌ BAD: Writes to cache first, backing store second — inconsistent on failure
class BadWriteThrough:
    def put(self, key: str, value: Any) -> None:
        self._cache[key] = value  # Cache updated...
        try:
            self.db.put(key, value)  # ...but DB write may fail
        except Exception:
            pass  # Swallowed — cache says data exists, DB does not


# ✅ GOOD: Backing store writes first; in-memory only updates after confirmation
# See WriteThroughCache.put() above — fails fast if backing store rejects the write

Pattern 3: Write-Behind (Async Cache-Aside with Batched Flush)

Write-Behind acknowledges the write to the caller immediately, buffers mutations, and flushes them asynchronously in batches. The BAD example demonstrates a memory leak scenario where buffered writes accumulate without any flushing strategy.

import asyncio
from collections import deque
from typing import Any


class WriteBehindCache:
    """Write-behind cache with asynchronous batched flush to backing store.

    Writes are acknowledged immediately and buffered. A background task
    flushes accumulated mutations to the backing store in configurable
    batches or at a maximum time interval — whichever comes first.
    """

    def __init__(self,
        backing_store: Any,
        flush_interval_seconds: float = 1.0,
        batch_size: int = 100,
    ):
        self._backing_store = backing_store
        self._buffer: deque[tuple[str, Any]] = deque(maxlen=50_000)
        self._flush_interval = flush_interval_seconds
        self._batch_size = batch_size

    async def put(self, key: str, value: Any) -> None:
        """Buffer a write; caller returns immediately without waiting for persistence."""
        self._buffer.append((key, value))
        if len(self._buffer) >= self._batch_size:
            await self._flush()

    async def flush(self) -> None:
        """Flush all buffered writes to the backing store atomically in one batch."""
        if not self._buffer:
            return

        batch = list(self._buffer)
        self._buffer.clear()

        try:
            # Batch write — single round trip to backing store
            for key, value in batch:
                self._backing_store.put(key, value)
        except Exception as exc:
            logger.error("Write-behind flush failed, re-queueing %d items", len(batch))
            self._buffer.extendleft(reversed(batch))
            raise

    def get_buffered_count(self) -> int:
        return len(self._buffer)

    async def drain(self) -> None:
        """Drain all remaining buffered writes (call during shutdown)."""
        while self._buffer:
            await self.flush()

BAD vs. GOOD — Write-Behind Safety:

# ❌ BAD: No flush strategy — buffer grows unbounded until OOM
class BadWriteBehind:
    def __init__(self):
        self._buffer = []  # Unbounded list, never flushed

    def put(self, key: str, value: Any) -> None:
        self._buffer.append((key, value))
        # No flush logic — data is lost on process restart


# ✅ GOOD: Bounded buffer with batch size threshold and shutdown drain
# See WriteBehindCache above — deque(maxlen=50_000) caps memory; flush() drains explicitly

Pattern 4: LRU Eviction Policy (Bounded In-Process Cache)

Least Recently Used eviction discards the oldest accessed entries when capacity is reached. Below, we show both a simple dictionary-based approach and a proper doubly-linked-list implementation that guarantees O(1) operations. BAD vs. GOOD — LRU Implementation:

# ❌ BAD: O(n) removal on eviction — scans entire dict to find least recently used
class BadLRUCache:
    def __init__(self, capacity: int = 100):
        self._cache: dict[str, Any] = {}
        self._capacity = capacity

    def get(self, key: str) -> Optional[Any]:
        if key in self._cache:
            return self._cache[key]
        return None

    def put(self, key: str, value: Any) -> None:
        self._cache[key] = value
        # O(n): finds and removes arbitrary old item — no real LRU logic
        if len(self._cache) > self._capacity:
            evicted_key = next(iter(self._cache))
            del self._cache[evicted_key]


# ✅ GOOD: Doubly-linked list + hash map for O(1) get, put, and eviction
class LRUCache:
    """Least Recently Used cache with O(1) operations via doubly-linked list."""

    class _Node:
        __slots__ = ("key", "value", "prev", "next")

        def __init__(self, key: str, value: Any) -> None:
            self.key = key
            self.value = value
            self.prev: LRUCache._Node | None = None
            self.next: LRUCache._Node | None = None

    def __init__(self, capacity: int) -> None:
        if capacity <= 0:
            raise ValueError("LRU capacity must be positive")

        self._capacity = capacity
        self._cache: dict[str, LRUCache._Node] = {}

        # Sentinel head and tail to eliminate boundary checks
        self._head = self._Node("", "")  # Most recently used side
        self._tail = self._Node("", "")  # Least recently used side
        self._head.next = self._tail
        self._tail.prev = self._head
        self._size = 0

    def get(self, key: str) -> Optional[Any]:
        """Retrieve value and move node to MRU position in O(1)."""
        if key not in self._cache:
            return None
        node = self._cache[key]
        self._move_to_head(node)
        return node.value

    def put(self, key: str, value: Any) -> None:
        """Insert or update key, evicting LRU node if at capacity."""
        if key in self._cache:
            node = self._cache[key]
            node.value = value
            self._move_to_head(node)
        else:
            node = self._Node(key, value)
            self._cache[key] = node
            self._add_to_head(node)
            self._size += 1

            if self._size > self._capacity:
                evicted = self._remove_tail()
                del self._cache[evicted.key]
                self._size -= 1

    def _add_to_head(self, node: _Node) -> None:
        node.prev = self._head
        node.next = self._head.next
        self._head.next.prev = node
        self._head.next = node

    def _move_to_head(self, node: _Node) -> None:
        self._remove_node(node)
        self._add_to_head(node)

    def _remove_node(self, node: _Node) -> None:
        prev_node = node.prev
        next_node = node.next
        if prev_node:
            prev_node.next = next_node
        if next_node:
            next_node.prev = prev_node

    def _remove_tail(self) -> _Node:
        node = self._tail.prev
        self._remove_node(node)
        return node

Pattern 5: TTL-Based Eviction with Sliding Expiration

TTL (Time-to-Live) expires entries after a configurable duration. The BAD example uses lazy expiration only on read, which means stale data can persist indefinitely if the key is never accessed again. The GOOD example adds proactive background cleanup alongside lazy checks.

class TTLCache:
    """Cache with TTL-based expiration and optional proactive cleanup.

    Each entry stores an absolute expiry timestamp. On read, expired entries
    are lazily evicted. A separate periodic sweeper can clean stale entries
    proactively to free memory regardless of access patterns.
    """

    def __init__(self, default_ttl_seconds: float = 60.0) -> None:
        self._cache: dict[str, tuple[Any, float]] = {}
        self._lock = threading.RLock()
        self._default_ttl = default_ttl_seconds
        self._sweep_interval = default_ttl_seconds * 2

    def set(
        self,
        key: str,
        value: Any,
        ttl_seconds: float | None = None,
    ) -> None:
        """Store a value with its TTL expiry timestamp."""
        ttl = ttl_seconds if ttl_seconds is not None else self._default_ttl
        expiry = time.monotonic() + ttl
        with self._lock:
            self._cache[key] = (value, expiry)

    def get(self, key: str) -> Optional[Any]:
        """Return value if present and not expired; lazily evict on miss."""
        with self._lock:
            entry = self._cache.get(key)
            if entry is None:
                return None
            value, expiry = entry
            if time.monotonic() >= expiry:
                del self._cache[key]
                return None
            return value

    def sweep(self) -> int:
        """Remove all expired entries. Returns count of evicted items."""
        now = time.monotonic()
        with self._lock:
            expired_keys = [
                k for k, (_, expiry) in self._cache.items() if now >= expiry
            ]
            for key in expired_keys:
                del self._cache[key]
            return len(expired_keys)

    def remaining_ttl(self, key: str) -> Optional[float]:
        """Return seconds until this key expires, or None if not found/expired."""
        with self._lock:
            entry = self._cache.get(key)
            if entry is None:
                return None
            _, expiry = entry
            remaining = expiry - time.monotonic()
            if remaining <= 0:
                del self._cache[key]
                return None
            return remaining

BAD vs. GOOD — TTL Consistency:

# ❌ BAD: Lazy eviction only; allows stale keys to persist indefinitely
class BadTTLCache:
    def __init__(self, default_ttl: float = 60):
        self._cache = {}  # Store without expiry

    def set(self, key: str, value: Any):
        self._cache[key] = value  # No expiry; entries remain forever

    def get(self, key: str) -> Optional[Any]:
        return self._cache.get(key)  # Eviction happens on access only


# ✅ GOOD: Combines lazy eviction with proactive sweeps — safety net against stale entries
# See TTLCache above for dual expiry strategies.

Constraints

MUST DO

Bound every cache instance with a maximum capacity — never use unbounded in-memory stores
Implement stampede prevention (mutex-based locking or probabilistic early expiration) for any key exceeding 50 QPS
Use TTL based on data staleness tolerance, not arbitrary defaults — profile your domain's consistency requirements
Separate L1 (process-local) and L2 (remote) caches by latency tier with clear read/write semantics
Implement proactive sweep/cleanup in addition to lazy eviction to reclaim memory from keys no longer accessed
Log cache hit/miss ratios and p99 read latency as operational metrics for capacity planning
Fail fast on backing store errors during write-through — do not silently corrupt cache state
Use per-key locking (not a global lock) for stampede prevention to maintain concurrency at scale

MUST NOT DO

Never serve stale data beyond the configured TTL without explicit caller consent — freshness is part of the contract
Do not use cache-aside as the sole strategy for write-heavy workloads (write ratio > 1:3 read:write)
Never skip backing store durability on write-through — cache must never be ahead of persistent storage
Do not rely solely on lazy TTL expiration — stale keys accumulate in memory without proactive sweep
Never place a global mutex around the entire cache get() path — it serializes all reads and defeats the purpose
Do not hardcode TTL values; parameterize per data type (e.g., session data = 30 min, product catalog = 6 hours)

Output Template

When designing or reviewing a caching strategy, produce:

Access Pattern Summary — Read/write ratio, estimated QPS per key tier, and hot-key count
Selected Strategy — Primary cache pattern (cache-aside / write-through / write-behind) with justification based on the access profile
Eviction Policy — LRU / LFU / TTL with capacity sizing rationale and memory estimate
Stampede Prevention Plan — Locking strategy or early expiration approach for keys above 50 QPS
Tier Architecture — L1/L2/L3 breakdown with latency targets per tier
Invalidation Strategy — TTL values, event-driven invalidation triggers, and consistency model (eventual vs. strong)
Failure Mode Analysis — What happens when L2 is unreachable; fallback behavior; circuit breaker thresholds

Live References

Authoritative documentation links for this domain. The model follows markdown links at load time to resolve external references and inline content.

Redis Documentation — Official Redis documentation covering data structures, eviction policies (LRU, LFU, TTL), clustering, and persistence
Memcached Protocol Specification — Memcached's binary protocol specification for caching system interoperability
Caching Best Practices (Google PageSpeed) — Google's web performance documentation on browser and CDN caching strategies
Thundering Herd Problem (Wikipedia) — Wikipedia article on the thundering herd problem in caching systems and prevention strategies
Distributed Caching Architectures (AWS) — AWS documentation on distributed caching patterns using ElastiCache for Redis/Memcached