Crux Read real Anthropic SDK request bodies and usage blocks, predict where the cache breakpoint lands and whether it hits, and pick the highest-leverage fix.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min
The request body decides what gets cached; the usage block tells you whether it worked. Read both, predict the behaviour, and choose the fix a senior engineer makes first — before touching the TTL.
Goal
Practise the loop you run on every caching incident: read where the cache_control breakpoint sits, read the usage fields to confirm a hit or a silent miss, and reach for the ordering fix before the tuning knob.
The breakpoint is placed correctly on the last stable block, yet the cache never hits. Why?
Heads-up The timestamp sits after the breakpoint, so it does not affect the cached prefix at all — that is exactly correct placement. Volatile content belongs after the breakpoint; it is not the cause of the miss.
Heads-up A breakpoint caches the entire prefix up to and including its block, so a breakpoint on the last stable block caches both blocks. Placing it on the doc is correct, not the cause of the miss.
Heads-up A single breakpoint on the second block caches the whole leading run, both blocks included. Multiple text blocks in system are normal and fully cacheable.
This usage block recurs on every request in a steady, high-frequency workload. What does it tell you?
Heads-up cache_creation means a write, not a read. A healthy steady workload shows cache_read_input_tokens high and cache_creation near zero after warm-up. Persistent writes with zero reads is the failure signature.
Heads-up input_tokens (41) is just the uncached tail. The 30,218 cache_creation tokens are billed at 1.25x as a write; nothing was read at 0.1x.
Heads-up This recurs on every request in steady traffic, so it is not a one-time warm-up. Reads stay at zero because the prefix changes each call.
The 1-hour TTL is set and the system prompt is stable, but hit rate is erratic across deploys. What is the highest-leverage fix?
Heads-up TTL controls survival time, not prefix stability. A reordered tool block at the front breaks the match regardless of TTL; no window length rescues a prefix that changes.
Heads-up Moving the breakpoint does not stop the tools from reordering. The fix is deterministic tool ordering; only then does breakpoint placement matter.
Heads-up Switching tiers changes the write multiplier, not the erratic-hit root cause. With a non-deterministic tool order you are paying writes constantly anyway.
Snippet 4 — break-even arithmetic
# Sonnet 4.6: base input $3.00/MTok, cache write (5m) $3.75/MTok, cache read $0.30/MTok# 20k-token stable prefix, re-read N times within the TTL windowwrite_cost = 20_000/1e6 * 3.75 # one writeread_cost = 20_000/1e6 * 0.30 * N # N readsuncached = 20_000/1e6 * 3.00 * (N+1) # same N+1 requests, no cache
Quiz
Completed
With this pricing, after how many reads does caching the 20k prefix become cheaper than not caching at all?
Heads-up TTL changes the write multiplier (1.25x vs 2x), not whether reads beat full rate. At $0.30 vs $3.00 per read, even the 5-minute tier breaks even on the first read.
Heads-up The premium is only 0.25x of one write (3.75 vs 3.00). It is recovered almost immediately because each read saves 2.70/MTok (3.00 − 0.30); one read more than covers it.
Heads-up Output tokens are unaffected by caching and identical either way; they drop out of the comparison. The break-even is set entirely by the input write vs read math.
Recap
Every caching question is read in the request body and the usage block. The breakpoint caches the whole prefix up to and including its block, so it belongs on the last stable block — but a stable breakpoint is worthless if the blocks in front of it (tools first, then system) are not byte-identical, which is why non-deterministic tool ordering and re-serialised whitespace are top poisoners. The usage fields are the only truth: cache_creation high with cache_read near zero on steady traffic is a poisoned prefix, not success. And the break-even arithmetic is brutal in caching’s favour — a re-read prefix beats full rate after a single read, so the real work is keeping the prefix stable, not tuning the TTL.