Build Journal

The Other Way Your AI Agent Dies: Iteration Budget Exhaustion

Your AI agent stops mid-task. You assume it is a bug. It is usually not — it has burned through its tool-calling budget. Every read, shell command, patch, browser check, and retry costs a turn. Here is how to spot the failure mode, recover cleanly, and design tasks that do not waste iterations.

2026-04-18 · 6 min read

The silent killer

You give your AI agent a task. It is going well — reading files, writing code, running tests. Then it just stops. Mid-sentence. Mid-function. Mid-deploy.

You check the logs. No crash. No obvious error. No network timeout. The agent simply stops taking tool actions and gives you a summary that sounds suspiciously like, “I did most of it, good luck with the rest.”

That is usually not the model losing interest. It is the iteration budget.

What is an iteration budget?

Every serious agent framework needs a cap on tool-calling loops. In Hermes, that cap is agent.max_turns: the maximum number of tool-action cycles an agent can take in one conversation. The current Hermes default is 90. On my own box I have it set to 80 at the moment, with delegated child agents on their own separate budget.

Each tool-action cycle costs budget. read_file, terminal, browser_snapshot, patch, web_search, even a “quick” verification command — all of it counts. Not one per file in a batch, but one per agent/tool round trip. Five separate reads are expensive. One bundled check that reads five files is cheap.

The limit exists for a good reason: without it, a stuck agent can loop forever, burning tokens, API calls, and your patience. The trap is that the failure mode looks like incompetence unless you know what you are seeing.

Why it matters more than you think

Context windows get all the attention. “My agent forgot what it was doing” is easy to blame on memory or compression. Iteration exhaustion is quieter and nastier:

It feels like a bug. The agent was working, then it stopped. You restart the same task and it fails again because the workflow is still too tool-heavy.
It punishes messy task design. Reading one file at a time, running five tiny shell commands, checking every route separately, then re-checking because the first check was vague — that is how you burn the budget.
It compounds with delegation. Subagents get their own iteration caps. A parent agent can waste turns spawning, briefing, checking, and re-checking children if the handoff is sloppy.
It hides near the finish line. The agent often runs out of budget when the work is 80% done, exactly when you need the boring verification passes most.

I found this the hard way while pushing long localdemo imports through backups, content fixes, image conversion, browser checks, and checkpoint updates. The work was not hard. The workflow was just too granular. Too many little actions, not enough batching.

The anatomy of an iteration

In plain English, an agent loop looks like this:

Read the user request.
Decide to call a tool.
Wait for the tool result.
Decide what to do next.
Repeat until the task is done, the context needs compression, the user interrupts, or the iteration budget is gone.

The critical insight: each think-then-act cycle has a cost. If your agent reads 20 files one by one, runs 10 terminal commands, patches 5 files separately, and then does 10 separate verification probes, you can spend half the session budget on mechanics before the real judgement work is finished.

How to tell if you hit the limit

In Hermes you may see a warning such as:

⚠️ Iteration budget exhausted

Or the final response may suddenly switch into handover mode: “I reached the maximum number of tool-calling iterations…” That is not a normal completion. It is the agent being forced to stop.

Other agent tools have similar failure modes. Claude Code may report max turns reached. Cursor-style agent tabs can simply stop advancing. The pattern is the same: the assistant was taking actions, then suddenly gives you a summary instead of finishing the job.

Designing around the budget

The fix is not only “increase the number”. Sometimes you should. But the better fix is to stop wasting turns.

1. Batch discovery. If you need to inspect ten files, use one scripted pass where possible. Pull out the facts you need instead of making the agent read everything separately.

2. Combine terminal checks. Five tiny commands can usually become one controlled shell/Python check with labelled output.

3. Use execute_code for repetitive logic. Lists, filters, counting, JSON parsing, HTTP probes, and report generation are exactly what scripts are for. Do not make the agent manually loop through obvious mechanics.

4. Save before long runs. If a task is obviously going to take many actions, save the checkpoint first. Then if the agent runs out of budget, the next session has a clean restart point instead of a pile of vibes.

5. Keep the scope honest. “Also fix everything nearby” is how a tidy one-post import turns into a 50-action swamp. Define the slug, the files, the guardrails, and the stop state.

The iteration budget vs context window trap

The iteration budget and the context window are separate limits.

You can hit the iteration limit with plenty of context left. That happens in tool-heavy sessions: file reads, browser checks, image conversions, curl probes, patches, and retries.

You can also hit the context limit with iterations left. That happens in long conversations full of pasted text, logs, or large tool outputs.

The recovery is different:

Context problem: compress, summarise, start a clean session with the right handover.
Iteration problem: save, start a clean session, continue from the last verified state, and batch better next time.

If the agent suddenly stops mid-task, do not just ask “is it smart enough?” Ask which limit it hit.

What I changed

After hitting this repeatedly, I changed the way I run long agent work:

I stopped treating every check as a separate conversation step. Multi-file inspections now go through scripted passes where that makes sense.
I made checkpoints explicit. Handover notes, pending files, and daily logs now capture what was done, what was verified, and exactly where to resume.
I keep imports one slug at a time. That sounds slower, but it prevents a single article fix from turning into a site-wide cleanup spiral.
I watch for “almost done” risk. The last 20% of a task is where verification lives. If the budget is nearly gone, I would rather stop cleanly than fake completion.

The point is not to make the agent do less. The point is to make it spend its turns on decisions, not admin.

The bottom line

Iteration budgets are not a bug. They are a safety rail. The problem is that most people only discover the rail when they hit it face-first.

If your agent stops mid-task:

Check for a budget/max-turns warning.
If it hit the budget, save the state and continue in a fresh session.
If it seems confused or forgetful, treat it as a context problem instead.
Either way, ask whether the task could have been batched, scoped, or checkpointed better.

The best iteration is the one you did not need.

Found this useful? 👉 Follow @Raf_VRS for more Build Journal updates 👉 Support the work: ko-fi.com/rafvrs

Stop Scrolling. Start Building. #LocalAI #AIAgents #HardInterference