Caching
Cache layers: multiple-choice review
Six questions that cut across the whole unit. Each one mirrors a decision you make in a real system — not a definition to recite, but a tradeoff to weigh the moment someone says “just add a cache.”
Confirm you can connect the latency ladder, where to put a cache, the hit-ratio break-even, and the wrong-layer and double-caching failure modes — the synthesis the lesson built toward.
A team fronts a Postgres primary-key lookup (warm, served from the buffer pool in ~0.3 ms) with an in-region Redis cache (~1 ms round trip). p99 read latency rises. Why?
A read endpoint runs a 3-table join taking 80 ms, called 50×/sec, on data that changes a few times a day. Where does the cache belong?
A cache with a 35% hit ratio fronts an 80 ms origin behind a 1 ms lookup. Average latency is 0.35×1 + 0.65×(1+80) ≈ 53 ms — better than 80. So why is a 35% hit ratio still a red flag?
Why is the OS page cache the most underrated layer when deciding whether to add Redis?
A value is cached in Redis (invalidated on write) AND its rendered response is cached at the CDN with a 5-minute TTL. After a write, users still see old data for minutes. What is the failure and the structural fix?
An engineer caches a per-user 'recommended for you' fragment at the CDN to cut origin load. Some users start seeing other users' recommendations. Root cause?
The through-line: caching is a latency ladder — L1, RAM, page cache, Redis, CDN — and a cache only helps when it fronts a genuinely slower origin, the hit ratio is high, and the data tolerates staleness. The page cache and DB buffer pool often make the origin RAM-fast already, so check real origin latency before reaching for Redis. The senior failures all repeat: fronting a fast origin, justifying a cache on a weak hit ratio, caching personalized data at a URL-keyed CDN, and double-caching one fact at two layers with independent TTLs. Cache as close to the consumer as the data’s volatility allows; give every fact one owner and one invalidation path.