Caching
Cache layers: free-recall review
Retrieval beats re-reading. For each prompt, say or write a full answer from memory before you open the model answer — the effort of recall is what makes the latency ladder and the failure modes stick.
Reconstruct the unit’s spine — the latency ladder, hit-ratio break-even, the OS page cache, where to cache, and the wrong-layer and double-caching failures — without looking back at the lesson.
- 01Walk the caching latency ladder from fastest to slowest with rough numbers, and explain why the orders of magnitude matter more than the exact figures.
- 02Why can adding a cache make latency worse, and what is the average-latency formula that proves it?
- 03What is the OS page cache, and why does it change the 'should I add Redis?' decision?
- 04Given a candidate read path, how do you decide whether to cache and at which layer?
- 05What is double-caching, why does it produce stale-on-stale bugs, and what makes them hard to debug?
- 06What does a low hit ratio actually tell you, and why is 'increase the hit ratio' not always the goal?
If you could reconstruct each answer from memory, you hold the unit’s spine: caching is a latency ladder where orders of magnitude decide whether a cache helps; a miss pays cache+origin so a low hit ratio or a fast origin makes the cache worse than none; the OS page cache and buffer pool often make the origin already RAM-fast; you cache only when the origin is slow, the hit ratio is high, and staleness is tolerable, at the layer the data’s volatility allows; and the recurring senior failures are wrong-layer caching and double-caching one fact at two layers — fixed by one owner and one invalidation path per cached fact.