Caching
Cache layers: place the cache where it earns its place
Reading about the latency ladder is not the same as deciding where a cache belongs under real traffic. Build a small read path that spans several layers, measure each one, and prove with numbers which cache earns its place — and which is just latency and risk.
Turn the unit’s mental model into a measurement loop: instrument the layer stack, measure origin latency and per-layer hit ratio, place the cache where the three gates (slow origin, high hit ratio, staleness-tolerant) are met, and verify with before/after numbers — then deliberately introduce and detect a wrong-layer or double-caching bug.
Take a small read-heavy service with at least two distinct endpoints — one expensive (a multi-table join or aggregate) and one cheap (a warm primary-key lookup) — and decide, with measurements, where (if anywhere) each one should be cached across the layer stack: app cache (Redis), reverse proxy, and CDN. Prove every placement decision with before/after numbers, then fix one injected failure mode.
- A before/after table per endpoint: origin latency, cached average latency, p99, per-layer hit ratio, and DB load — measured under identical load, not estimated.
- Evidence that caching the fast PK endpoint in Redis raised its latency (the wrong-layer regression), with the numbers and a one-line explanation tying it to the latency ladder.
- A per-layer trace (X-Cache / CF-Cache-Status / Age) for one slow-endpoint request showing the hit/miss path down the stack and which layer served it.
- The injected failure reproduced with evidence (a stale read after a write, or a cross-user leak), then shown fixed under the same test, with a one-paragraph write-up of the single owner and single invalidation path you established.
- Add stale-while-revalidate to the slow endpoint's cache headers and show p99 drops and origin revalidations trickle instead of spiking at TTL expiry.
- Add a one-page runbook: triage a suspected stale-cache report by walking the layers (Age, CF-Cache-Status, X-Cache, proxy logs, Redis hit/miss, DB), with the order and what each header tells you.
- Compute the break-even hit ratio for the slow endpoint (the ratio below which the cache is net-negative) and run a load mix that pushes the real ratio below it, demonstrating the cache becoming a net loss.
- Add a metric that flags any cached fact stored at two layers with independent TTLs (a double-caching guard) and wire it into your dashboard.
This is the loop you run whenever someone says “add a cache”: measure the origin’s real latency, check the three gates (slow origin, high hit ratio, staleness-tolerant), place the cache at the layer the data’s volatility allows, and prove the win with before/after numbers under identical load. Caching the fast PK lookup proves the wrong-layer regression by experiment; the injected double-cache or cross-user leak proves why one owner and one invalidation path per fact is non-negotiable. Doing it once on a toy stack makes the production decision muscle memory.