Caching CACHE · 03 · 10

Cache stampede: build and tame the herd

Hands-on project — build a stampede-prone cache-aside service, reproduce the herd, layer the mitigations, and prove each step with before/after numbers.

CACHE Senior ◷ 240 min

Level

FoundationsJuniorMiddleSenior

Reading about stampedes is not the same as watching your own DB fall over and then bringing it back. Build a small cache-aside service, drive a hot key into a stampede on purpose, and add the unit’s mitigations one layer at a time — measuring the herd before and after each.

Goal

Turn the unit’s mental model into a reproducible loop: reproduce the burst, instrument the fingerprint, then layer single-flight, a distributed lock, SWR, TTL jitter, and negative caching — proving with metrics that each layer reduces the herd it is supposed to.

Project

0 of 7

Objective

Build a cache-aside HTTP service backed by Redis and a deliberately slow origin, reproduce a cache stampede on a hot key, then layer the unit's mitigations until a TTL-boundary burst reaches the origin as a single rebuild — proving every step with before/after measurements, not estimates.

Requirements

Acceptance criteria

A before/after table per mitigation: origin queries per TTL boundary, request p99 latency, and cache miss rate — all measured under the identical load test, not estimated.
With the full stack enabled, a TTL boundary under sustained hot-key load produces at most a single origin rebuild, and the sawtooth origin-query fingerprint is gone from the metrics.
A demonstration that single-flight alone leaves one rebuild per instance, and only adding the cross-node lock (or SWR background refresh) collapses it to one fleet-wide — proving you understand the scope of each layer.
A short write-up naming, for each layer, exactly which herd it bounded (per-process, per-fleet, wait-time, multi-key, negative) and why that layer was needed on top of the previous one.

Senior stretch

Implement XFetch probabilistic early expiration on the hot key and show it refreshes before the boundary with ~1 early rebuild per window and zero misses at expiry; then show it underperforms on a cold key read once per TTL.
Add a fencing-token (or monotonic-version) check on the rebuild write and craft a test where the lock EX is shorter than a slow rebuild, proving the guard rejects the stale duplicate write.
Reproduce a metastable failure: add client retries with short backoff, push the origin to saturation, and show it stays pinned after load stops; then break the loop with a 503-on-overload gate and measure recovery time.
Add the minimum-viable dashboard with alerts (miss-rate spike, sawtooth origin rate, lock-wait above rebuild p99) and a one-page on-call runbook: triage from the panels, the mitigation ladder, and a pre-warm-before-cutover gate.

Recap

This is the loop you will run on any real cache tier: reproduce the burst before you trust a fix, instrument the fingerprint, then add mitigations in scope order — single-flight for the per-process herd, a lock for the per-fleet herd, SWR for the wait, jitter for synchronized multi-key expiry, negative caching for the miss storm — and verify each with before/after numbers under identical load. Doing it once on a toy service is what makes the production version, and the 3am incident, muscle memory.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.