AI / LLM Integration AI · 05 · 08

LLM cost budgets: free-recall review

Free-recall prompts across the LLM cost-budgets unit. Answer each in your own words first, then reveal the model answer and compare.

AI Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

Retrieval beats re-reading. For each prompt, say or write a full answer from memory before you open the model answer — the effort of recall is what makes the cost model stick.

Goal

Reconstruct the unit’s spine — token asymmetry, where context accumulates, routing economics, prompt caching, and the in-process kill switch — without looking back at the lesson.

Recall before you leave

01
Why is output the expensive half of an LLM bill, and what concrete levers attack it?
02
A stateless model re-sends context every turn. Name the three things that inflate the re-sent payload and how each grows.
03
When does model routing (cheap-first cascade) actually save money, and when does it backfire?
04
Explain prompt caching: what gets discounted, by how much, and how do you structure a prompt to maximise the benefit?
05
Why does an uncapped agent loop burn money superlinearly, and why can't a monthly provider cap stop it?
06
List the LLM cost controls in priority order, cheapest first-line to last-resort, and say what each one bounds.

Recap

If you could reconstruct each answer from memory, you hold the unit’s spine: output costs ~5x input so cap it; the system prompt, history, and RAG all re-send every turn (fixed, linear, and multiplicative respectively); routing saves only at a low escalation rate; caching the stable prefix drops it to 0.1x and pays off on the first hit; and because a runaway loop is superlinear while a monthly cap is measured in days, the real brake is an in-process budget plus a kill switch on cost velocity. Now when you face a real cost incident, you’ll reach for the arithmetic before you reach for the cap.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.