AI / LLM Integration AI · 06 · 09

Agents: code and loop reading

Read real agent-loop snippets — tool dispatch, step caps, context trimming, error recovery — predict the behaviour, and pick the highest-leverage fix.

AI Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

The agent loop is where the money is spent and the bugs hide. Read each snippet the way you would read it in a code review on a service that bills per token, then choose the fix a senior makes first.

Goal

Practise the loop you run on every agent: read the control flow, predict where it runs forever, overflows, or thrashes, and reach for the structural guard before blaming the model.

Snippet 1 — the loop with one exit

def run_agent(task, tools):
    messages = [SYSTEM, {"role": "user", "content": task}]
    while True:
        resp = model(messages, tools)          # THINK
        if not resp.tool_calls:
            return resp.content                 # only exit: model stops calling tools
        messages.append(resp)
        for call in resp.tool_calls:
            result = dispatch(call, tools)       # ACT
            messages.append(result)              # OBSERVE

Quiz

This loop is correct ReAct but unsafe for production. What is the single most important guard to add, and why that one first?

Snippet 2 — the step cap that hides a bug

for step in range(MAX_STEPS):          # MAX_STEPS = 100
    resp = model(messages, tools)
    if not resp.tool_calls:
        return resp.content
    messages.append(resp)
    for call in resp.tool_calls:
        messages.append(dispatch(call, tools))
# fell out of the loop: cap hit
return "Sorry, I couldn't complete that."

Quiz

Telemetry shows ~30% of runs fall through and return the apology. The cap is doing its job — so what does a senior conclude?

Snippet 3 — context trimming

def trim(messages, budget=8000):
    # keep most recent messages until we're under the token budget
    kept = []
    total = 0
    for m in reversed(messages):
        total += count_tokens(m)
        if total > budget:
            break
        kept.append(m)
    return list(reversed(kept))

Quiz

This keeps the loop under the window, but it has a failure mode that surfaces on long tasks. What is it, and the fix?

Snippet 4 — error recovery

for step in range(MAX_STEPS):
    resp = model(messages, tools)
    if not resp.tool_calls:
        return resp.content
    messages.append(resp)
    for call in resp.tool_calls:
        try:
            result = dispatch(call, tools)
        except ToolError as e:
            result = {"role": "tool", "content": f"Error: {e}"}  # feed error back
        messages.append(result)

Quiz

Feeding the error back lets the model adapt — but a load test shows runs where the same tool fails identically dozens of times. What guard closes the gap?

Recap

Every agent incident is read in the loop: a single ‘model stops’ exit is unsafe, so add a hard step cap plus a wall-clock/token budget; a cap that fires routinely is a seatbelt catching a real bug, not a fix to loosen; naive context trimming evicts the pinned system/task messages and makes the agent forget its job, so pin instructions and summarise the middle; and error-feedback without a per-tool retry/dedup cap turns recovery into an infinite loop of valid calls. Read the control flow, find the unbounded path, add the structural guard, then re-run under load to confirm.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.