AI / LLM Integration AI · 06 · 01

The agent loop: ReAct, runaway steps, and context that grows every turn

An LLM agent is a while-loop that calls the model, runs a tool, appends the result, and calls the model again. The danger is the loop that won''''t stop and the history that won''''t stop growing — both bill you per token.

AI Junior ◷ 17 min

Level

FoundationsJuniorMiddleSenior

Already know this unit? Take a 1-minute quick check →

The on-call alert is a billing spike, not an outage. One agent request that should cost $0.08 has been running for nine minutes and burned about $12. Pull the trace: the agent called search_orders, got an empty result, called search_orders again with the same arguments, got the same empty result, and did it 140 more times. No exception, no crash — every step was a valid model call with a valid tool call. The loop simply had no reason to stop, and nobody had told it when to.

In ten minutes you’ll know exactly how the loop works, why it runs up a quadratic bill, and what guards keep it from becoming the $12 trace.

The loop is five lines

Strip away the frameworks and an LLM agent is almost embarrassingly simple. It is the ReAct pattern — Reason, Act, Observe — wrapped in a while:

messages = [system, user_task]
while True:
    response = model(messages, tools)        # THINK: model reasons + decides
    if not response.tool_calls: break         # no tool wanted → it's the answer
    result = run_tool(response.tool_calls)    # ACT: execute the chosen tool
    messages.append(response)                 # remember what it decided
    messages.append(result)                   # OBSERVE: feed the result back

That is the whole engine. The model looks at the conversation, decides whether to call a tool or to answer, and if it called a tool you run it and hand the output back as the next observation. The model never executes anything itself — your code does. The “agentic” part is only that the model, not your control flow, picks the next action each turn. Everything hard about agents in production is a consequence of those two facts: the loop can run any number of times, and messages gets longer on every pass.

Context accumulates, and the cost is not linear

Here is the part juniors miss. Every iteration appends the model’s reasoning, its tool call, and the tool’s result to messages, and then you send the entire history to the model again next turn. You are not paying for one step — you are paying to re-read the whole transcript on every step.

If step n adds roughly k tokens, then by step N each call is reading about N·k tokens, and the total cost of the run is the sum 1+2+3+…+N — quadratic in the number of steps. A 5-step task is cheap. A 15-step task is not three times the cost; it is closer to eight times, plus latency, because each call also waits on a larger prompt. A Reflexion-style loop running ten cycles has been measured at roughly 50× the tokens of a single linear pass. This quadratic growth is the most dangerous economic trap in agent design, and it is invisible until the trace is long.

Step	What gets appended	Tokens sent THIS call	Cumulative cost
1	task + 1st think/act/observe	~1k	~1k
5	4 more triplets	~5k	~15k
15	10 more triplets	~15k	~120k (quadratic)
40+	history exceeds window	overflow / truncation	errors mid-task

There is a second failure waiting at the end of that table: context-window overflow mid-task. The history grows monotonically, so a long-horizon task eventually pushes past the model’s context limit. Now the framework either errors out or silently truncates the oldest messages — which often includes the original instructions. The agent forgets what it was asked to do and confidently finishes the wrong task. Models also attend disproportionately to the start and end of a long context, so a critical fact discovered at step 5 of a 15-step chain can effectively vanish even before any truncation.

Termination: the loop needs more than one exit

A naive loop has exactly one way out — the model declines to call a tool. That is not enough in production, because a model that is stuck, confused, or thrashing will happily keep calling tools forever. A senior gives the loop several independent exits:

Natural completion — the model returns a final answer with no tool call. The good case.
Max-step guard — a hard cap on iterations. LangGraph ships this as recursion_limit (default 25 super-steps in 1.0.x), and you almost always set it far lower — 10 to 25 for most tasks — because a task that needs 1000 steps is a task that has lost the plot.
Wall-clock / token budget — stop after, say, 60 seconds or N total tokens, regardless of step count. This is what actually caps the dollar figure.
Progress / dedup check — if the last few steps repeat the same tool with the same arguments, the agent is thrashing; break the loop or inject a “you are repeating yourself, stop” message.

Together these four exits guard four different failure modes: a stuck model, an uncapped dollar figure, and a thrashing loop that never makes progress. Without the dedup check in particular, every empty tool result looks like a reason to retry — and you end up in the $12 trace.

The billing-spike story is the classic missing exit: the loop had only the “model didn’t call a tool” door, and the model kept calling a tool, so it never reached the door.

▸Why this works

Treat the step cap as a seatbelt, not a steering wheel. A hard max_steps is a safety net that bounds the worst case, but if your agent routinely hits it, the cap is hiding a real bug — a bad tool, a vague prompt, a missing termination signal. The primary control should be the model reaching a genuine stop because the task is done; the cap is what saves you the night it doesn’t.

Error recovery, and the retry that never gives up

When a tool fails, you usually feed the error text back as the observation so the model can adapt — fix the arguments, try a different tool. This is one of the loop’s superpowers. It is also a trap. Suppose search_orders returns an empty list (not even an error — just nothing useful). A poorly-prompted agent reads that, decides it must have made a mistake, and calls search_orders again. Same input, same empty result, same conclusion. Without a dedup or progress check, that is an infinite loop made of perfectly valid calls — exactly the $12 trace from the hook. The fix is structural: cap retries per tool, detect identical consecutive calls, and make “I cannot find this, here is what I tried” an acceptable terminal answer rather than a failure the model must keep fighting.

Scripted workflow vs open-ended agent

The senior question is rarely “how do I build an agent” — it is “does this even need to be an agent?” A scripted workflow hard-codes the steps: call tool A, then B, then C, with if/else for branches. It is cheaper (one or two model calls, not fifteen), faster, fully predictable, and trivial to test. An open-ended agent lets the model decide the steps at runtime; it handles tasks you could not enumerate in advance, at the cost of nondeterminism, higher token spend, and a much larger surface for the loop to misbehave.

Dimension	Scripted workflow	Open-ended agent
Who picks the next step	You, at code-time	The model, at run-time
Cost / latency	Low, bounded	High, quadratic in steps
Predictability	Deterministic, testable	Nondeterministic
Handles novel tasks	No — only what you coded	Yes — adapts at runtime

The tradeoff is autonomy against control, cost, and predictability. Give the model the freedom to choose its own path only where that freedom earns its keep; everywhere the path is known, script it. Dynamic turn limits that adjust to a task’s success probability have been shown to cut costs ~24% while holding solve rates — evidence that the right amount of autonomy is usually less than the default.

Pick the best fit

A support flow always does the same three steps: look up the order, check refund eligibility, issue or deny the refund. Pick the design.

Quiz

Why does a 15-step agent run cost much more than 3× a 5-step run?

Quiz

An agent calls the same tool with identical arguments five times in a row, getting the same empty result. What is the correct guard?

Order the steps

Order one iteration of the ReAct agent loop:

1 Send the full message history + tool definitions to the model (THINK)
2 Model returns either a final answer or a tool call
3 If it's a final answer with no tool call → break the loop
4 Otherwise execute the chosen tool (ACT)
5 Append the tool's result to the history (OBSERVE) and loop again

One iteration: send the full history to the model (THINK), it returns a final answer or a tool call. A final answer breaks the loop; a tool call is executed (ACT) and its result appended (OBSERVE), then the loop repeats — re-sending the now-larger history, which is why cost grows quadratically.

Recall before you leave

01
Walk a teammate through why an agent's cost grows quadratically with the number of steps, and what overflows at the far end.
02
Why is a hard max-step cap necessary but not sufficient for safe termination, and what else does a senior add?

Recap

An LLM agent is a while-loop around the model: it thinks (the model reasons and picks an action), acts (your code runs the chosen tool), observes (the result is appended), and repeats — the ReAct pattern, and only the model’s freedom to pick the next action makes it “agentic.” Two facts drive everything hard about it. The loop can run any number of times, and the history grows on every pass, so cost is quadratic in steps because the whole transcript is re-sent each turn — a 15-step run is ~9× a 5-step run, and a long-horizon task eventually overflows the context window and silently drops its own instructions. A naive loop has one exit (the model stops calling tools), which a stuck or thrashing model never reaches; the billing-spike trace is an agent retrying the same empty tool call 140 times. So give the loop several independent exits: natural completion, a hard max-step guard (LangGraph’s recursion_limit), a wall-clock or token budget that caps the actual dollar cost, and a dedup/progress check that kills thrashing. Above all, weigh autonomy against control, cost, and predictability — a scripted workflow is cheaper, faster, and testable wherever the path is known, so reserve the open-ended agent for tasks whose steps you genuinely cannot enumerate in advance. Now when you see a billing alert on an agent request, your first move is to pull the trace, count the repeated tool calls, and check which exit was missing — not to blame the model.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Grounded RAG ServiceA RAG demo that answers from a corpus is easy; a RAG service you'd trust in front of users is not. The hard part isn't retrieval, it's grounding: making the model say only what the retrieved text supports, attaching citations the reader can check, and proving with an eval set that the answers don't drift into confident fiction. You'll build the whole loop — chunk, embed, store, retrieve top-k, ground, cite, score — and feel exactly where it leaks.