AI / LLM Integration
The agent loop: ReAct, runaway steps, and context that grows every turn
The on-call alert is a billing spike, not an outage. One agent request that should cost $0.08 has been running for nine minutes and burned about $12. Pull the trace: the agent called search_orders, got an empty result, called search_orders again with the same arguments, got the same empty result, and did it 140 more times. No exception, no crash — every step was a valid model call with a valid tool call. The loop simply had no reason to stop, and nobody had told it when to.
The loop is five lines
Strip away the frameworks and an LLM agent is almost embarrassingly simple. It is the ReAct pattern — Reason, Act, Observe — wrapped in a while:
messages = [system, user_task]
while True:
response = model(messages, tools) # THINK: model reasons + decides
if not response.tool_calls: break # no tool wanted → it's the answer
result = run_tool(response.tool_calls) # ACT: execute the chosen tool
messages.append(response) # remember what it decided
messages.append(result) # OBSERVE: feed the result backThat is the whole engine. The model looks at the conversation, decides whether to call a tool or to answer, and if it called a tool you run it and hand the output back as the next observation. The model never executes anything itself — your code does. The “agentic” part is only that the model, not your control flow, picks the next action each turn. Everything hard about agents in production is a consequence of those two facts: the loop can run any number of times, and messages gets longer on every pass.
Context accumulates, and the cost is not linear
Here is the part juniors miss. Every iteration appends the model’s reasoning, its tool call, and the tool’s result to messages, and then you send the entire history to the model again next turn. You are not paying for one step — you are paying to re-read the whole transcript on every step.
If step n adds roughly k tokens, then by step N each call is reading about N·k tokens, and the total cost of the run is the sum 1+2+3+…+N — quadratic in the number of steps. A 5-step task is cheap. A 15-step task is not three times the cost; it is closer to eight times, plus latency, because each call also waits on a larger prompt. A Reflexion-style loop running ten cycles has been measured at roughly 50× the tokens of a single linear pass. This quadratic growth is the most dangerous economic trap in agent design, and it is invisible until the trace is long.
| Step | What gets appended | Tokens sent THIS call | Cumulative cost |
|---|---|---|---|
| 1 | task + 1st think/act/observe | ~1k | ~1k |
| 5 | 4 more triplets | ~5k | ~15k |
| 15 | 10 more triplets | ~15k | ~120k (quadratic) |
| 40+ | history exceeds window | overflow / truncation | errors mid-task |
There is a second failure waiting at the end of that table: context-window overflow mid-task. The history grows monotonically, so a long-horizon task eventually pushes past the model’s context limit. Now the framework either errors out or silently truncates the oldest messages — which often includes the original instructions. The agent forgets what it was asked to do and confidently finishes the wrong task. Models also attend disproportionately to the start and end of a long context, so a critical fact discovered at step 5 of a 15-step chain can effectively vanish even before any truncation.
Termination: the loop needs more than one exit
A naive loop has exactly one way out — the model declines to call a tool. That is not enough in production, because a model that is stuck, confused, or thrashing will happily keep calling tools forever. A senior gives the loop several independent exits:
- Natural completion — the model returns a final answer with no tool call. The good case.
- Max-step guard — a hard cap on iterations. LangGraph ships this as
recursion_limit(default 25 super-steps in 1.0.x), and you almost always set it far lower — 10 to 25 for most tasks — because a task that needs 1000 steps is a task that has lost the plot. - Wall-clock / token budget — stop after, say, 60 seconds or N total tokens, regardless of step count. This is what actually caps the dollar figure.
- Progress / dedup check — if the last few steps repeat the same tool with the same arguments, the agent is thrashing; break the loop or inject a “you are repeating yourself, stop” message.
The billing-spike story is the classic missing exit: the loop had only the “model didn’t call a tool” door, and the model kept calling a tool, so it never reached the door.
Why this works
Treat the step cap as a seatbelt, not a steering wheel. A hard max_steps is a safety net that bounds the worst case, but if your agent routinely hits it, the cap is hiding a real bug — a bad tool, a vague prompt, a missing termination signal. The primary control should be the model reaching a genuine stop because the task is done; the cap is what saves you the night it doesn’t.
Error recovery, and the retry that never gives up
When a tool fails, you usually feed the error text back as the observation so the model can adapt — fix the arguments, try a different tool. This is one of the loop’s superpowers. It is also a trap. Suppose search_orders returns an empty list (not even an error — just nothing useful). A poorly-prompted agent reads that, decides it must have made a mistake, and calls search_orders again. Same input, same empty result, same conclusion. Without a dedup or progress check, that is an infinite loop made of perfectly valid calls — exactly the $12 trace from the hook. The fix is structural: cap retries per tool, detect identical consecutive calls, and make “I cannot find this, here is what I tried” an acceptable terminal answer rather than a failure the model must keep fighting.
Scripted workflow vs open-ended agent
The senior question is rarely “how do I build an agent” — it is “does this even need to be an agent?” A scripted workflow hard-codes the steps: call tool A, then B, then C, with if/else for branches. It is cheaper (one or two model calls, not fifteen), faster, fully predictable, and trivial to test. An open-ended agent lets the model decide the steps at runtime; it handles tasks you could not enumerate in advance, at the cost of nondeterminism, higher token spend, and a much larger surface for the loop to misbehave.
| Dimension | Scripted workflow | Open-ended agent |
|---|---|---|
| Who picks the next step | You, at code-time | The model, at run-time |
| Cost / latency | Low, bounded | High, quadratic in steps |
| Predictability | Deterministic, testable | Nondeterministic |
| Handles novel tasks | No — only what you coded | Yes — adapts at runtime |
The tradeoff is autonomy against control, cost, and predictability. Give the model the freedom to choose its own path only where that freedom earns its keep; everywhere the path is known, script it. Dynamic turn limits that adjust to a task’s success probability have been shown to cut costs ~24% while holding solve rates — evidence that the right amount of autonomy is usually less than the default.
A support flow always does the same three steps: look up the order, check refund eligibility, issue or deny the refund. Pick the design.
Why does a 15-step agent run cost much more than 3× a 5-step run?
An agent calls the same tool with identical arguments five times in a row, getting the same empty result. What is the correct guard?
Order one iteration of the ReAct agent loop:
- 1 Send the full message history + tool definitions to the model (THINK)
- 2 Model returns either a final answer or a tool call
- 3 If it's a final answer with no tool call → break the loop
- 4 Otherwise execute the chosen tool (ACT)
- 5 Append the tool's result to the history (OBSERVE) and loop again
- 01Walk a teammate through why an agent's cost grows quadratically with the number of steps, and what overflows at the far end.
- 02Why is a hard max-step cap necessary but not sufficient for safe termination, and what else does a senior add?
An LLM agent is a while-loop around the model: it thinks (the model reasons and picks an action), acts (your code runs the chosen tool), observes (the result is appended), and repeats — the ReAct pattern, and only the model’s freedom to pick the next action makes it “agentic.” Two facts drive everything hard about it. The loop can run any number of times, and the history grows on every pass, so cost is quadratic in steps because the whole transcript is re-sent each turn — a 15-step run is ~9× a 5-step run, and a long-horizon task eventually overflows the context window and silently drops its own instructions. A naive loop has one exit (the model stops calling tools), which a stuck or thrashing model never reaches; the billing-spike trace is an agent retrying the same empty tool call 140 times. So give the loop several independent exits: natural completion, a hard max-step guard (LangGraph’s recursion_limit), a wall-clock or token budget that caps the actual dollar cost, and a dedup/progress check that kills thrashing. Above all, weigh autonomy against control, cost, and predictability — a scripted workflow is cheaper, faster, and testable wherever the path is known, so reserve the open-ended agent for tasks whose steps you genuinely cannot enumerate in advance.