Crux Read real agent-loop snippets — tool dispatch, step caps, context trimming, error recovery — predict the behaviour, and pick the highest-leverage fix.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min
The agent loop is where the money is spent and the bugs hide. Read each snippet the way you would read it in a code review on a service that bills per token, then choose the fix a senior makes first.
Goal
Practise the loop you run on every agent: read the control flow, predict where it runs forever, overflows, or thrashes, and reach for the structural guard before blaming the model.
Snippet 1 — the loop with one exit
def run_agent(task, tools): messages = [SYSTEM, {"role": "user", "content": task}] while True: resp = model(messages, tools) # THINK if not resp.tool_calls: return resp.content # only exit: model stops calling tools messages.append(resp) for call in resp.tool_calls: result = dispatch(call, tools) # ACT messages.append(result) # OBSERVE
Quiz
Completed
This loop is correct ReAct but unsafe for production. What is the single most important guard to add, and why that one first?
Heads-up Error handling matters, but it is secondary — and feeding errors back without a cap can itself cause an infinite retry. The first defect is that nothing bounds the number of iterations at all.
Heads-up Caching shaves cost but does not stop a runaway loop; an agent thrashing on new-but-useless states still runs forever. You need a hard bound on iterations and spend.
Heads-up Streaming changes how output arrives, not whether the loop terminates. The missing piece is a termination guard, not a transport change.
Snippet 2 — the step cap that hides a bug
for step in range(MAX_STEPS): # MAX_STEPS = 100 resp = model(messages, tools) if not resp.tool_calls: return resp.content messages.append(resp) for call in resp.tool_calls: messages.append(dispatch(call, tools))# fell out of the loop: cap hitreturn "Sorry, I couldn't complete that."
Quiz
Completed
Telemetry shows ~30% of runs fall through and return the apology. The cap is doing its job — so what does a senior conclude?
Heads-up Raising the cap lets a quadratic loop burn far more tokens before tripping, and a loop that needs hundreds of steps has usually lost the plot. The fix is the underlying failure, not more rope.
Heads-up Returning partial work can be reasonable, but it doesn't address why a third of runs can't finish. The cap firing that often points to a structural defect upstream.
Heads-up A 30% cap-hit rate is a loud signal of thrashing or a broken tool, not a baseline to accept. The cap masks it; diagnosis from traces is required.
Snippet 3 — context trimming
def trim(messages, budget=8000): # keep most recent messages until we're under the token budget kept = [] total = 0 for m in reversed(messages): total += count_tokens(m) if total > budget: break kept.append(m) return list(reversed(kept))
Quiz
Completed
This keeps the loop under the window, but it has a failure mode that surfaces on long tasks. What is it, and the fix?
Heads-up Approximate counting is fine with a safety margin; that is not the structural bug. The real defect is that the unconditional 'keep most recent' rule evicts the instructions the task depends on.
Heads-up Two reversals are O(n) and negligible next to model calls. The substantive bug is what gets evicted — the pinned context — not the list mechanics.
Heads-up Keeping only the oldest would discard recent tool results the model needs to act now. The right answer is to pin the instructions and trim/summarise the middle, not to flip which end you keep.
Snippet 4 — error recovery
for step in range(MAX_STEPS): resp = model(messages, tools) if not resp.tool_calls: return resp.content messages.append(resp) for call in resp.tool_calls: try: result = dispatch(call, tools) except ToolError as e: result = {"role": "tool", "content": f"Error: {e}"} # feed error back messages.append(result)
Quiz
Completed
Feeding the error back lets the model adapt — but a load test shows runs where the same tool fails identically dozens of times. What guard closes the gap?
Heads-up Catching more exceptions makes the loop more resilient to crashing but more prone to thrashing — it keeps feeding failures back forever. The missing piece is a cap on identical/failed retries.
Heads-up Backoff helps for transient/rate-limit errors, but for a deterministic failure (bad args, not-found) it just delays the same identical failure. You still need a dedup/retry cap at the loop level.
Heads-up Crashing throws away the model's ability to recover from recoverable errors. The goal is bounded recovery — feed back, but cap identical retries — not no recovery.
Recap
Every agent incident is read in the loop: a single ‘model stops’ exit is unsafe, so add a hard step cap plus a wall-clock/token budget; a cap that fires routinely is a seatbelt catching a real bug, not a fix to loosen; naive context trimming evicts the pinned system/task messages and makes the agent forget its job, so pin instructions and summarise the middle; and error-feedback without a per-tool retry/dedup cap turns recovery into an infinite loop of valid calls. Read the control flow, find the unbounded path, add the structural guard, then re-run under load to confirm.