Caching
Cache stampede: build and tame the herd
Reading about stampedes is not the same as watching your own DB fall over and then bringing it back. Build a small cache-aside service, drive a hot key into a stampede on purpose, and add the unit’s mitigations one layer at a time — measuring the herd before and after each.
Turn the unit’s mental model into a reproducible loop: reproduce the burst, instrument the fingerprint, then layer single-flight, a distributed lock, SWR, TTL jitter, and negative caching — proving with metrics that each layer reduces the herd it is supposed to.
Build a cache-aside HTTP service backed by Redis and a deliberately slow origin, reproduce a cache stampede on a hot key, then layer the unit's mitigations until a TTL-boundary burst reaches the origin as a single rebuild — proving every step with before/after measurements, not estimates.
- A before/after table per mitigation: origin queries per TTL boundary, request p99 latency, and cache miss rate — all measured under the identical load test, not estimated.
- With the full stack enabled, a TTL boundary under sustained hot-key load produces at most a single origin rebuild, and the sawtooth origin-query fingerprint is gone from the metrics.
- A demonstration that single-flight alone leaves one rebuild per instance, and only adding the cross-node lock (or SWR background refresh) collapses it to one fleet-wide — proving you understand the scope of each layer.
- A short write-up naming, for each layer, exactly which herd it bounded (per-process, per-fleet, wait-time, multi-key, negative) and why that layer was needed on top of the previous one.
- Implement XFetch probabilistic early expiration on the hot key and show it refreshes before the boundary with ~1 early rebuild per window and zero misses at expiry; then show it underperforms on a cold key read once per TTL.
- Add a fencing-token (or monotonic-version) check on the rebuild write and craft a test where the lock EX is shorter than a slow rebuild, proving the guard rejects the stale duplicate write.
- Reproduce a metastable failure: add client retries with short backoff, push the origin to saturation, and show it stays pinned after load stops; then break the loop with a 503-on-overload gate and measure recovery time.
- Add the minimum-viable dashboard with alerts (miss-rate spike, sawtooth origin rate, lock-wait above rebuild p99) and a one-page on-call runbook: triage from the panels, the mitigation ladder, and a pre-warm-before-cutover gate.
This is the loop you will run on any real cache tier: reproduce the burst before you trust a fix, instrument the fingerprint, then add mitigations in scope order — single-flight for the per-process herd, a lock for the per-fleet herd, SWR for the wait, jitter for synchronized multi-key expiry, negative caching for the miss storm — and verify each with before/after numbers under identical load. Doing it once on a toy service is what makes the production version, and the 3am incident, muscle memory.