AI / LLM Integration AI · 02 · 09

Tool calls: code and schema reading

Read real tool-calling snippets — a tool JSON schema, a parallel-call dispatcher, a loop with timeout/error handling, and an argument validator — and pick the highest-leverage fix.

AI Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

Tool-calling bugs live in the schema, the dispatcher, and the loop — not in the model. Read each snippet the way you would in review, then choose the fix a senior engineer would make first.

Goal

Practise the diagnosis loop you run on every agent: read the tool schema and the loop code, predict how the model and your handler will behave, and reach for the fix that closes the trust boundary or stops the runaway.

Snippet 1 — the tool schema

When you see a tool schema in review, the first question is not “does this look right?” but “what does the model do when it doesn’t know the id?” A loose schema is a hallucination invitation.

{
  "name": "cancel_order",
  "description": "Cancel an order.",
  "input_schema": {
    "type": "object",
    "properties": {
      "order_id": { "type": "string" },
      "reason":   { "type": "string" }
    }
  }
}

Quiz

A mutating cancel_order tool ships with this schema. What is the single biggest weakness, and the highest-leverage fix?

Snippet 2 — the parallel dispatcher

results = []
for block in response.tool_use_blocks:          # may be several this turn
    output = TOOLS[block.name](**block.input)    # run sequentially
    results.append(tool_result(block.id, output))
send(messages + results)

Quiz

When the model emits three independent tool_use blocks in one turn, what does this dispatcher get right and what does it leave on the table?

Snippet 3 — the loop with no guard

while True:
    resp = model.create(messages=messages, tools=TOOLS)
    if resp.stop_reason != "tool_use":
        break
    for b in resp.tool_use_blocks:
        out = run_tool(b.name, b.input)          # may raise / hang
        messages.append(tool_result(b.id, out))
    messages.append(resp.message)

Quiz

This loop runs in production against client tools. Which two defects will bite first under failure, and how do you fix them?

Snippet 4 — the validator

def handle(block):
    args = block.input
    try:
        validated = ToolArgs.model_validate(args)   # pydantic: shape + types
    except ValidationError as e:
        return tool_result(block.id, f"invalid arguments: {e}", is_error=True)
    return tool_result(block.id, run(validated))

Quiz

This handler schema-validates with Pydantic and returns errors as a tool_result. For a mutating tool against a multi-tenant database, what is still missing?

Recap

Every tool-calling bug is read in the schema, the dispatcher, the loop, or the validator. A loose schema (no required, no enum, empty description) lets the model guess and gives you nothing to validate. A sequential dispatcher is correct but forfeits the parallel-call latency win for independent calls. while True with no per-tool timeout is a runaway and a stall waiting to happen — cap iterations and time-box each tool. And Pydantic shape-validation is necessary but not sufficient for a mutating call: add the authorization and existence check, and return every rejection as a tool_result so the model can self-correct. Now when you spot while True in a code review or a schema with no required fields, you know which failure mode each one unlocks — and which fix closes it.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.