awesome-everything RU
↑ Back to the climb

AI / LLM Integration

Tool calls: the round-trip loop, schema validation, and the guard against runaway agents

Crux Tool calling turns a model into a function caller, but the model only emits a request — you execute it. Every call is a full extra round trip, the arguments can be hallucinated, and an unguarded loop will burn tokens forever.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 17 min

A support agent ships. A customer types “cancel my last order.” The model confidently emits a tool call: POST /orders/{id}/cancel with id: "ord_9f3c" — an id it never saw, fabricated to look plausible. The handler ran eval-style: it trusted the arguments and fired the request. Wrong customer’s order, cancelled. Worse, the next week a different bug had the model re-calling a failing lookup_order tool 40 times in one turn before anyone killed it — $9 of tokens for a question that had no answer. Both bugs share one root cause: the loop trusted the model.

The round-trip loop is a contract, not a function

Tool calling makes a model behave like a function you call, but the wiring is inverted: the model is the caller and your code is the callee. You declare the tools; the model decides when to invoke one and emits a structured request; your code runs it. The model never executes anything itself.

The canonical shape is a while loop keyed on the response’s stop_reason:

  1. Send the request with your tools array and the user message.
  2. The model responds with stop_reason: "tool_use" and one or more tool_use blocks (each has a tool name and a JSON arguments object).
  3. Execute each tool. Format outputs as tool_result blocks.
  4. Send a new request with the full history plus those tool_result blocks.
  5. Repeat while stop_reason is still "tool_use". Exit on "end_turn" (final answer), "max_tokens", "stop_sequence", or "refusal".

The load-bearing detail seniors internalize: step 4 is a brand-new model call. A three-tool task is four model invocations, each re-sending the whole growing transcript. This is why tool latency dominates — and why the loop guard below is not optional.

Tools are JSON-schema declarations

Each tool is a name, a description, and an input_schema — a JSON Schema object describing the arguments. The model reads the schema the same way a developer reads a function signature. The schema is doing real work: it both tells the model how to call the tool and gives you the contract to validate against before you execute.

Schemas are not free. A typical tool definition costs roughly 500 tokens; ten tools is ~5,000 tokens of overhead on every request in the loop, since the full tools array is re-sent each round. A 10-tool agent running a 6-step task pays that 5,000-token tax six times. This is the first place prompt caching earns its keep — caching the static tools block can cut input cost 40–80% and improve time-to-first-token.

tool_choiceBehaviorWhen a senior picks it
autoModel decides: call a tool or answer in proseDefault for agents; the model judges if a tool is needed
anyMust call some tool, model picks whichWhen prose is never a valid answer (a router that must dispatch)
tool (forced)Must call this exact named toolStructured extraction: force one schema to get guaranteed-shape JSON
noneForbid all tools this turnForce a text summary after results are in

Parallel calls cut latency — but not every chain can use them

Modern models (Claude 4-class) will, when several independent tools are needed, emit multiple tool_use blocks in a single response. You run them concurrently and return all the tool_result blocks together. That collapses three sequential round trips into one — a real latency win, because each round trip is a fresh model call of hundreds of ms to seconds.

The catch is dependency. Parallelism only helps when the calls are independent (get_weather(NYC) and get_weather(SF)). A chain where call two needs the output of call one (find_user then cancel_user_order) is inherently serial and cannot be parallelized — the model has to see the first result before it can fill the second tool’s arguments. You can set disable_parallel_tool_use: true to force one tool per turn when your execution layer can’t safely run things concurrently.

Why this works

Server-executed tools (web search, code execution) run their own loop inside the provider and have a built-in iteration cap. When they hit it mid-task the response comes back with stop_reason: "pause_turn" rather than "end_turn" — you re-send the conversation to continue. Client tools have no such built-in cap; the guard is yours to write.

Validate arguments — never trust them

The model emits plausible JSON, not correct JSON. It can hallucinate an id, invent an enum value the schema never listed, omit a required field, or pass a string where a number belongs. The opening disaster — a fabricated ord_9f3c fed straight into a cancel endpoint — is the canonical failure: the handler treated model output as trusted input.

The senior discipline is a hard gate before execution:

  • Schema-validate the arguments against the tool’s input_schema (a validator like Pydantic or jsonschema). Reject malformed shapes outright; never eval or blindly destructure them.
  • Authorize and existence-check the referenced entities. A well-formed id is still untrusted — confirm the order exists and belongs to this caller before acting on it.
  • On rejection, return a tool_result with an error, not an exception that breaks the loop. The model reads the error and can self-correct on the next turn — that feedback path is the whole point of returning structured tool errors.

Treat tool arguments exactly like any other untrusted user input crossing a trust boundary, because that is precisely what they are.

The max-iteration guard against runaway loops

Without a turn cap, a confused model loops forever: it calls lookup_order, gets an error, calls it again with the same arguments, gets the same error, and repeats. Each iteration is a full model call billing the entire accumulating transcript — costs and tokens climb with every step. This is a real outage and a real bill (one stuck turn quietly burned $9 of tokens before a human intervened).

Two guards, both mandatory in production:

  1. A hard iteration capfor step in range(MAX_STEPS) (often 8–15). Hit the cap, stop the loop, return a graceful failure to the user. Never while True.
  2. Loop / repeat detection — if the model calls the same tool with the same arguments twice in a row, that is a stuck signal. Break, or inject a message telling it the call already failed so it stops repeating.
Pick the best fit

Your agent loop calls real mutating endpoints (cancel order, issue refund). How do you handle the arguments the model emits?

Quiz

A 5-step agent task uses tools at every step. Roughly how many model calls is that, and why does it matter?

Quiz

The model returns a tool_use for cancel_order with id 'ord_9f3c', an id never present in the conversation. What's the senior move?

Order the steps

Order one safe iteration of a client-side tool-use loop:

  1. 1 Send request with tools array; read stop_reason from the response
  2. 2 If stop_reason is tool_use, extract each tool_use block's name and arguments
  3. 3 Schema-validate the arguments, then authorize/existence-check referenced entities
  4. 4 Execute valid calls (parallel if independent); format outputs as tool_result blocks
  5. 5 Send a new request with the full history + tool_result blocks — under the max-iteration cap
Recall before you leave
  1. 01
    Walk through why an unguarded tool-use loop is both a correctness risk and a cost risk, and the two guards you add.
  2. 02
    Why must you validate tool arguments, and what does 'validate' actually mean for a mutating endpoint like cancel_order?
Recap

Tool calling inverts the usual contract: the model is the caller and your code is the callee. The model emits a structured tool_use request with a tool name and JSON arguments; your code executes it and returns a tool_result, and the loop repeats while stop_reason stays "tool_use". Three things define senior-grade tool use. First, latency and cost: every tool round trip is a brand-new model call re-sending the whole growing transcript plus a ~500-tokens-per-tool schema array, so a 5-step task is ~6 model calls — prompt caching the static tools block is the standard mitigation. Second, validation: model arguments are untrusted input that can be hallucinated, so you schema-validate, then authorize and existence-check referenced entities, then execute, returning errors as tool_result so the model can self-correct — never eval a fabricated id into a mutating endpoint. Third, the guard: a hard iteration cap plus repeat-detection, because an unguarded loop will re-call a failing tool forever and burn real money. Use tool_choice (auto/any/tool/none) to control whether and which tool fires, and parallel tool calls to collapse independent round trips — but only when the calls don’t depend on each other.

Continue the climb ↑Tool calls: multiple-choice review
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources4
expand
  1. 01
  2. 02
  3. 03
  4. 04

Trademarks belong to their respective owners. Editorial reference only.