Caching
Dogpile: multiple-choice review
Six questions that cut across the whole unit. Each one mirrors a decision you make mid-incident when one hot key takes the origin down — not a definition to recite, but a tradeoff to weigh on the lock, the lease, and the recompute window.
Confirm you can connect the expiry-instant collision, single-flight coalescing, distributed locking with TTL and lease renewal, and probabilistic early expiry — the synthesis the lesson built toward.
A single hot key (200ms recompute) takes the DB down every time its TTL expires under ~5000 req/s. Which fix most directly collapses the herd, and what is the one detail that makes it safe?
A local single-flight (in-process mutex) cut the herd but the DB still sees ~20 concurrent recomputes per expiry across the fleet. Why, and what closes the gap?
You add a distributed recompute lock but skip the lock TTL (no expiry). What is the senior-level failure?
The recompute usually takes 2s but occasionally 25s. You set the distributed lock TTL to 5s. What goes wrong, and what is the principle?
Rather than guess a lock TTL, you have the holder renew a short lease (a heartbeat that re-extends the lock every second while it works). What does this buy, and what new failure must you handle?
A teammate argues a lock is always the wrong tool because it serializes readers, and wants probabilistic early expiry (XFetch) everywhere instead. When is each actually the right call?
The through-line is one decision: the dogpile is the collision at a single hot key’s expiry instant, and every fix targets a different point in that window. Single-flight collapses N concurrent misses into one recompute — but a local mutex only coalesces per process, a distributed lock needs a TTL longer than the worst-case recompute (or a renewed lease) so a crashed holder cannot deadlock all readers, and a lease still needs fencing against a paused holder. XFetch dissolves the collision by making the hot key recompute early and alone before it ever expires under load. Jitter staggers a batch of keys and does nothing for one hot key. Most dogpile postmortems are about the fix deadlocking, serializing, or double-writing — not the original miss.