AI / LLM Integration AI · 05 · 07

LLM cost budgets: multiple-choice review

Multiple-choice synthesis across the LLM cost-budgets unit — token asymmetry, re-sent context, model routing, prompt caching, and in-process budget guardrails.

AI Senior ◷ 13 min

Level

FoundationsJuniorMiddleSenior

Six questions that cut across the whole unit. Each mirrors a call you make on a real cost incident — not a definition to recite, but a tradeoff to weigh while the meter is running.

Goal

Confirm you can connect token pricing, re-sent context, routing, caching, and in-process budgets into one decision — the synthesis the overview lesson built toward.

Quiz

A support chatbot on Sonnet 4.6 ($3/M in, $15/M out) sends a 200-token question and gets a 1,500-token answer, mostly chain-of-thought the user never sees. Where is the spend, and what is the first lever?

Quiz

A 50-turn chat re-sends a 4,000-token system prompt every turn, and the input bill is climbing. Which fix has the highest leverage?

Quiz

A team routes the easy 80% of requests to Haiku ($1/$5) and escalates failures to Opus ($5/$25). After launch the bill barely moved. Most likely cause?

Quiz

An autonomous agent loop with no iteration cap runs overnight and bills $4,300. Why didn't the $1,000/month spend cap stop it?

Quiz

Why does an uncapped agent loop cost superlinearly in the number of iterations, not just linearly?

Quiz

You are designing cost controls for an LLM feature. Which ordering — from cheapest first-line defense to last-resort — reflects the unit's priority?

Recap

The through-line is one decision tree: output costs ~5x input so cap it first; the stateless model re-sends the system prompt, history, and RAG every turn so cache the stable prefix and trim the volatile parts; route the easy majority cheap and watch the escalation rate; and because a runaway loop costs superlinearly while a monthly cap is measured in days, the real brake is an in-process budget plus a kill switch on cost velocity. Every control reduces or bounds re-sent context and output before it bounds the bill. Now when you see a cost spike in production, you have a checklist — not a guess.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.