backend-caching - SKILL.md Agent Skill

name: backend-caching description: > Use this skill when the user says 'cache', 'Redis', 'Memcached', 'CDN', 'cache-aside', 'read-through', 'write-through', 'write-behind', 'cache invalidation', 'TTL', 'cache stampede', 'thundering herd', 'cache warming', 'LRU', 'LFU', 'cache hit ratio', 'cache strategy', or when designing a caching layer. This skill enforces consistent caching strategies: layer selection, read/write patterns, invalidation, stampede prevention, and monitoring. Applies to any backend stack. Do NOT use for: message queue design, database schema design, or frontend state caching. version: "2.0.0" author: "j4flmao" license: "MIT" compatibility: claude-code: true cursor: true codex: true windsurf: true tags: [backend, caching, phase-2, universal]

Backend Caching

Purpose

Design consistent, production-grade caching layers. Every cache must follow the same conventions for strategy selection, data flow, invalidation, stampede prevention, TTL management, and monitoring.

Agent Protocol

Trigger

Exact user phrases: "cache", "Redis", "Memcached", "CDN", "cache-aside", "read-through", "write-through", "write-behind", "cache invalidation", "TTL", "cache stampede", "thundering herd", "cache warming", "LRU", "LFU", "cache hit ratio", "cache strategy", "design a caching layer", "add caching".

Input Context

The data being cached (DB query result, computed value, API response, static asset).
The read/write ratio.
The consistency requirement (eventual vs strong).
The hosting topology (single node, clustered, multi-region).

Output Artifact

Caching strategy specs as text. No file unless requested.

Response Format

Layer: {application | distributed | CDN}
Store: {Redis | Memcached | CDN | in-memory}
Strategy: {cache-aside | read-through | write-through | write-behind}
Key format: {namespace}:{entity}:{id}
TTL: {duration}
Invalidation: {manual | TTL-based | event-driven}
Stampede protection: {yes/no — method}

No preamble. No postamble. No explanations. No filler/hedging/transitions.

Completion Criteria

Cache strategy selected with justification
Key naming convention defined with namespaces
TTL set for every cache entry (no infinite TTL unless immutable data)
Stale data tolerance documented
Invalidation strategy defined (TTL and/or event-driven)
Cache stampede prevention in place for high-traffic keys
Monitoring plan (hit ratio, latency, memory) defined

Architecture Decision Trees

Cache Layer Selection

What is the primary goal?
├── Reduce latency for hot data (<1ms reads)
│   ├── Single-node app? → In-memory cache (LRU map, async cache)
│   └── Multi-node app? → Distributed cache (Redis cluster)
├── Reduce database load
│   ├── Read-heavy workload (>80% reads)?
│   │   ├── Yes → Cache-aside with distributed cache
│   │   └── No → Write-behind or read-through
│   └── Expensive queries (>100ms)?
│       ├── Yes → Cache result with medium TTL
│       └── No → Simple cache-aside is sufficient
├── Serve static/semi-static content globally
│   ├── Global audience? → CDN (CloudFront, Cloudflare, Fastly)
│   └── Regional audience? → CDN or reverse proxy cache
└── Handle API response caching
    ├── Public data? → CDN + gateway caching
    └── User-specific data? → Private cache (Cache-Control: private)

Consistency Model Decision Tree

Can the application tolerate stale data?
├── Yes → Is stale-while-revalidate acceptable?
│   ├── Yes → Cache-aside with background refresh
│   └── No → Cache-aside with short TTL (seconds)
├── No, eventual consistency is fine
│   └── Read-through or write-behind
└── No, strong consistency required
    ├── Is read volume much higher than write?
    │   ├── Yes → Write-through cache (write DB + cache atomically)
    │   └── No → Write-through with cache invalidation after DB write
    └── Is cache acting as source of truth?
        └── Never use cache as primary store

Cache Stampede Prevention

Is there a single hot key that gets many concurrent requests?
├── Yes → Does the key expire and cause thundering herd?
│   ├── Yes → Which prevention method?
│   │   ├── Mutex locking (NX key) → First request loads, others wait
│   │   ├── Probabilistic early expiration → Refresh before expiry
│   │   ├── Stale-while-revalidate → Serve stale, refresh async
│   │   └── Background refresh → Dedicated worker refreshes hot keys
│   └── No → Regular cache-aside is sufficient
└── No → Standard TTL-based caching is fine

Invalidation Strategy Selection

Can the system tolerate stale data for up to TTL duration?
├── Yes → TTL-based invalidation (simplest)
├── No → Are cache and DB in the same transaction boundary?
│   ├── Yes → Write-through (DB + cache in transaction)
│   └── No → Event-driven invalidation (publish eviction event)
└── Mixed → TTL for most data, event-driven for critical data

Workflow

Step 1: Choose Cache Layer

Layer	Latency	Durability	Shared	Best For
In-memory (local)	~0.1ms	Lost on restart	Per-instance	Hot data, session, computed values
Distributed (Redis)	~1-5ms	Configurable	All instances	Shared data, counters, rate limits
CDN	~10-50ms	Durable at edge	Global	Static assets, public API responses
Database query cache	~0.5ms	DB-backed	All instances	Frequent identical queries

Step 2: Choose Strategy

Cache-aside (lazy loading) — default for most applications:

Read:
  1. Check cache (key lookup)
  2. Cache hit → return data
  3. Cache miss → read from DB
  4. Store in cache with TTL
  5. Return data

Write:
  1. Write to DB
  2. Delete cache entry for that key

class CacheAside<T> {
  constructor(
    private cache: CacheStore,
    private db: Database,
    private ttl: number
  ) {}

  async get(key: string, fetchFn: () => Promise<T>): Promise<T> {
    const cached = await this.cache.get(key);
    if (cached) return JSON.parse(cached) as T;

    const data = await fetchFn();
    await this.cache.set(key, JSON.stringify(data), { ttl: this.ttl });
    return data;
  }

  async invalidate(key: string): Promise<void> {
    await this.cache.del(key);
  }
}

Read-through — cache library handles DB loading transparently:

Cache library intercepts reads, loads from DB on miss
Recommended: Write DB first, then delete cache (cache will be populated on next read)
Best for: key-value lookups with consistent access patterns

Write-through — write DB and cache atomically:

Write:
  1. Write to DB
  2. Write to cache (or update in place)

Best for: strong consistency requirements
Risk: higher write latency

Write-behind (write-back) — write cache first, async write DB:

Write:
  1. Write to cache (immediate acknowledgment)
  2. Async write to DB (deferred, batched)

Best for: write-heavy workloads, can tolerate data loss
Risk: data loss on cache failure

Step 3: Key Naming Convention

Namespace:Entity:ID[:Subfield]

Examples:
  user:abc123                    → User object
  user:abc123:profile            → User profile sub-object
  product:xyz456:inventory       → Product inventory
  page:/docs/getting-started     → Rendered page
  rate:limit:user:abc123         → Rate limit counter
  session:abc123                 → Session data
  lock:payment:order_456         → Distributed lock

Key design rules:

Always include a namespace prefix to avoid collisions
Use colon-delimited hierarchy for logical grouping
Include version in key when schema changes: user:v2:abc123
Max key length: Redis recommends < 1KB
Use consistent key generation function

function cacheKey(namespace: string, entity: string, id: string, subfield?: string): string {
  const parts = [namespace, entity, id];
  if (subfield) parts.push(subfield);
  return parts.join(':');
}

Step 4: TTL Management

Data Type	TTL	Rationale
Immutable reference data	24h+ or infinite	Never changes, invalidate on event
Slowly-changing (product catalog)	1-6 hours	Typically updated via admin
User profile (non-critical)	5-15 minutes	Short tolerance for staleness
Session data	Session duration + grace	Must survive user activity gap
Rate limit counters	Window duration	Auto-cleaned by TTL
API responses	30-300 seconds	Depends on API freshness requirements
Search results	1-5 minutes	Fast-changing but cacheable
Computed/aggregated data	1-60 minutes	Expensive to compute, low change frequency

TTL randomization: add ±10% jitter to prevent mass expiry stampede:

function ttlWithJitter(baseTtl: number, jitterPercent = 10): number {
  const jitter = baseTtl * (jitterPercent / 100) * (Math.random() * 2 - 1);
  return Math.round(baseTtl + jitter);
}

Step 5: Cache Stampede Prevention

Option A — Mutex locking:

async function getWithMutex<T>(key: string, fetchFn: () => Promise<T>, ttl: number): Promise<T> {
  const cached = await cache.get(key);
  if (cached) return JSON.parse(cached);

  // Acquire distributed lock
  const lockKey = `lock:${key}`;
  const acquired = await cache.setnx(lockKey, '1', { ttl: 5000 }); // 5s lock
  if (acquired) {
    try {
      const data = await fetchFn();
      await cache.set(key, JSON.stringify(data), { ttl });
      return data;
    } finally {
      await cache.del(lockKey);
    }
  }

  // Wait for first request to complete, then read
  await sleep(50);
  return getWithMutex(key, fetchFn, ttl);
}

Option B — Probabilistic early expiration (XFetch):

function shouldRecompute(ttl: number, age: number, beta = 4): boolean {
  const remaining = ttl - age;
  const probability = Math.exp(-beta * (remaining / ttl));
  return Math.random() < probability;
}

// Usage in cache read
async function getWithEarlyExpiry<T>(key: string, fetchFn: () => Promise<T>, ttl: number): Promise<T> {
  const entry = await cache.getWithAge(key);
  if (!entry) return fetchFn();

  if (shouldRecompute(ttl, entry.age)) {
    // Background refresh — don't block the response
    fetchFn().then(fresh => cache.set(key, JSON.stringify(fresh), { ttl }));
  }

  return JSON.parse(entry.value);
}

Option C — Stale-while-revalidate:

async function getStaleWhileRevalidate<T>(key: string, fetchFn: () => Promise<T>, ttl: number, swrTtl: number): Promise<T> {
  const entry = await cache.getWithMetadata(key);
  if (!entry) {
    const fresh = await fetchFn();
    await cache.set(key, JSON.stringify(fresh), { ttl: ttl + swrTtl });
    return fresh;
  }

  const age = Date.now() - entry.createdAt;
  if (age < ttl) return JSON.parse(entry.value); // Fresh enough

  if (age < ttl + swrTtl) {
    // Stale but within swr window — serve stale, refresh async
    fetchFn().then(fresh => cache.set(key, JSON.stringify(fresh), { ttl: ttl + swrTtl }));
    return JSON.parse(entry.value);
  }

  // Too stale — fetch fresh synchronously
  const fresh = await fetchFn();
  await cache.set(key, JSON.stringify(fresh), { ttl: ttl + swrTtl });
  return fresh;
}

Option D — Background refresh:

class BackgroundRefresher {
  private timers = new Map<string, NodeJS.Timeout>();

  scheduleRefresh(key: string, ttl: number, fetchFn: () => Promise<void>): void {
    // Refresh at 80% of TTL
    const refreshMs = ttl * 0.8 * 1000;
    const timer = setInterval(() => {
      fetchFn().catch(err => console.error(`Cache refresh failed for ${key}:`, err));
    }, refreshMs);
    this.timers.set(key, timer);
  }

  stopRefresh(key: string): void {
    const timer = this.timers.get(key);
    if (timer) {
      clearInterval(timer);
      this.timers.delete(key);
    }
  }
}

Step 6: Invalidation Strategies

Strategy	Mechanism	Latency	Complexity	Best For
TTL-based	Automatic expiry	TTL duration	None	Any cacheable data
Event-driven	Pub/sub invalidation event	Near-real-time	Moderate	Data with known change events
Write-through	Update cache on write	Write-time	Low	Strong consistency
Manual	Admin API/CLI purge	On-demand	Low	Schema migrations, data fixes
Batch purge	Pattern-based deletion	Seconds	Moderate	Related data changes

Invalidation order (critical for correctness):

1. Write to database
2. Delete cache entry
3. Done — Never invalidate before write (race condition: cache invalidated, then write fails, subsequent read gets stale)

Event-driven invalidation pattern:

interface CacheInvalidationEvent {
  key: string;
  pattern?: string;       // Pattern for batch invalidation: "user:abc123:*"
  reason: string;
  timestamp: number;
}

// Publisher (on data change)
class CacheInvalidator {
  constructor(private pubSub: PubSub) {}

  async invalidateKey(key: string): Promise<void> {
    await this.pubSub.publish('cache:invalidate', { key, timestamp: Date.now(), reason: 'data_updated' });
  }

  async invalidatePattern(pattern: string): Promise<void> {
    await this.pubSub.publish('cache:invalidate', { pattern, timestamp: Date.now(), reason: 'batch_update' });
  }
}

// Subscriber (cache layer)
async function onInvalidationEvent(event: CacheInvalidationEvent): Promise<void> {
  if (event.key) {
    await cache.del(event.key);
  } else if (event.pattern) {
    const keys = await cache.keys(event.pattern);
    if (keys.length > 0) await cache.del(...keys);
  }
}

Production Considerations

Cache Sizing

Data Size	Users	Cache Size	Redis Memory
1KB/entry	1M	1GB	2GB (with overhead)
10KB/entry	100K	1GB	2.5GB
100KB/entry	10K	1GB	3GB
Session (512B)	1M	512MB	1GB

Rule of thumb: provision 2-3x the expected data size for Redis overhead.

Connection Pooling

// Redis connection pool
import { Redis } from 'ioredis';

const cluster = new Redis.Cluster([
  { host: 'redis-0.internal', port: 6379 },
  { host: 'redis-1.internal', port: 6379 },
  { host: 'redis-2.internal', port: 6379 },
], {
  maxRedirections: 16,
  enableReadyCheck: true,
  retryDelayOnFailover: 100,
  retryDelayOnClusterDown: 100,
  clusterRetryStrategy: (times) => Math.min(times * 100, 3000),
  redisOptions: {
    enableAutoPipelining: true,
    maxRetriesPerRequest: 3,
    retryStrategy: (times) => Math.min(times * 50, 2000),
    lazyConnect: true,
  },
});

Serialization

Use fast serialization: MessagePack, Protocol Buffers for high-throughput
Compress values > 1KB (Snappy, LZ4, or Gzip)
Consider using RedisJSON module for partial key updates
Avoid storing large objects (>1MB) in cache — store reference instead

// Compressed caching
async function getCompressed<T>(key: string, fetchFn: () => Promise<T>, ttl: number): Promise<T> {
  const raw = await cache.get(key);
  if (raw) {
    const decompressed = await decompress(raw);
    return JSON.parse(decompressed);
  }

  const data = await fetchFn();
  const serialized = JSON.stringify(data);
  const compressed = await compress(serialized);
  await cache.set(key, compressed, { ttl });
  return data;
}

async function compress(data: string): Promise<Buffer> {
  const input = Buffer.from(data, 'utf-8');
  const output = await brotliCompress(input);
  return output;
}

async function decompress(data: Buffer): Promise<string> {
  const output = await brotliDecompress(data);
  return output.toString('utf-8');
}

Anti-Patterns

Anti-Pattern 1: Cache as Primary Data Store

Problem: Redis/Memcached evicts data under memory pressure. Restart clears all. Fix: Always have DB fallback. Cache is a performance layer, not a storage layer.

Anti-Pattern 2: Infinite TTL

Problem: Stale data served forever. Schema changes break cached data. Fix: Always set TTL. Maximum 24h for mutable data.

Anti-Pattern 3: Write Cache Before DB

Problem: Cache write succeeds, DB write fails. Cache has phantom data. Fix: Always write DB first, then invalidate or update cache.

Anti-Pattern 4: Same TTL for All Keys

Problem: Thundering herd on mass expiry. All keys expire at once. Fix: Add ±10% jitter to TTL values.

Anti-Pattern 5: Caching Everything

Problem: Low hit ratio on rarely accessed data wastes memory. Fix: Cache only frequently accessed data. Monitor hit ratio (<80% means wrong data cached).

Anti-Pattern 6: No Cache Null Results

Problem: Repeated DB misses on non-existent keys cause unnecessary load. Fix: Cache null results with short TTL (30-60s).

Anti-Pattern 7: Cache in Request Path Only

Problem: Cache is populated only on read, leaving it empty for first user. Fix: Pre-warm cache after deployment, or use write-through for predictable data.

Anti-Pattern 8: Over-Caching Complex Queries

Problem: Caching entire complex query results with long TTL. Data changes invalidate everything. Fix: Cache individual entities, compose at read time. Or use short TTL for query results.

Security Considerations

Redis Security

Require authentication (requirepass)
Disable CONFIG command (rename-command CONFIG "")
Run Redis as non-root user
Bind to internal network only (not 0.0.0.0)
Use TLS for Redis connections in production
Enable Redis ACLs for multi-tenant setups

Cache Poisoning

Validate and sanitize data before caching
Never cache raw user input as keys
Use key hashing for user-provided identifiers
Sign cached values for integrity verification

Data Leakage

Never cache PII/PHI without encryption
Use encrypted Redis (TLS + at-rest encryption)
Clear cache on data deletion (GDPR right to erasure)
Set maxmemory-policy to allkeys-lru for automatic eviction

Comparative Analysis

Cache Strategies

Aspect	Cache-Aside	Read-Through	Write-Through	Write-Behind
Read consistency	Eventual	Eventually consistent	Strong	Eventual
Write latency	Low	Low	Higher (2 writes)	Very low
Read latency	Cache miss = DB hit	Always consistent	Always cache hit	Always cache hit
Complexity	Low	Moderate	Moderate	High
Data loss risk	None (DB source)	None (DB source)	None (both)	High (cache failure)
DB write load	Normal	Normal	Normal	Reduced (batching)
Best for	General purpose	Key-value lookups	Strong consistency	Write-heavy workloads

Redis vs Memcached vs CDN

Aspect	Redis	Memcached	CDN
Data types	Rich (string, list, set, sorted set, hash, stream, JSON)	Simple (string only)	Bytes
Persistence	Configurable (RDB/AOF)	None	Durable
Clustering	Native clustering	Client-side sharding	Global
Lua scripting	Yes	No	No
Pub/Sub	Yes	No	No
Max value size	512MB	1MB	Varies (typically 10MB-2GB)
Use case	General purpose, caching, rate limits, queues, sessions	Simple key-value caching	Static assets, API responses at edge

Performance Considerations

Redis Performance

Single-threaded: one command at a time. Pipeline commands for throughput.
Benchmark: ~100K ops/sec for GET/SET on single node
Use pipelining for batch operations (reduces RTT)
Enable pipelining in ioredis: redis.pipeline().set('a', '1').get('a').exec()
Use SCAN instead of KEYS for production (KEYS blocks)
Monitor slowlog: SLOWLOG GET 10

Memory Optimization

Use hash data structure for objects: HMSET user:123 name "John" age 30
Enable compression for values > 1KB
Set appropriate maxmemory and eviction policy
Use memory-optimized data types (ziplist, intset)
Monitor memory fragmentation: INFO MEMORY
Redis 7.4+ has better memory efficiency with new serialization

Monitoring Key Metrics

Metric	Warning	Critical	Action
Hit ratio	<90%	<80%	Review caching strategy
Evictions	>0	>100/sec	Increase memory or optimize
Memory usage	>80%	>90%	Scale up or optimize
Connected clients	>5000	>10000	Increase connection pool
Latency p99	>5ms	>10ms	Check slow commands

Rules

Always set a TTL. Never use infinite TTL except for known immutable data with manual invalidation.
Never put large objects (>1MB) in cache. Compress or split.
Cache keys must include a namespace prefix to avoid collisions.
Always monitor cache hit ratio. Below 80% means cache is ineffective.
Never use cache as a primary data store. Caches lose data.
Always have a fallback when cache is unavailable (degrade gracefully to DB).
Use connection pooling for Redis/Memcached. One connection per request is wasteful.
Cache null results (with short TTL) to prevent repeated DB misses on missing data.
Write to DB first, then invalidate/update cache. Never the reverse.
Add ±10% jitter to TTL values to prevent thundering herd.
Implement stampede protection for hot keys on high-traffic endpoints.
Monitor evictions: if keys are evicted before TTL, cache is too small.

References

references/cache-invalidation.md — Cache Invalidation
references/cache-monitoring.md — Cache Monitoring
references/cache-strategies.md — Cache Strategies
references/cache-testing.md — Cache Testing
references/cdn-caching.md — CDN Caching
references/redis-patterns.md — Redis Patterns
references/caching-fundamentals.md — Caching Fundamentals
references/caching-advanced.md — Caching Advanced Patterns
references/caching-stampede-prevention.md — Cache Stampede Prevention

Handoff

No artifact produced unless requested. Next skill: backend-rate-limiting — if the cache layer needs protection against traffic spikes. Carry forward: cache key conventions, strategy, TTL policies, invalidation plan.