agentic loops: observe, plan, act, verify

Table of content

the basic loop

every autonomous agent runs a variation of this cycle:

observe — gather information about the current state
plan — decide what to do next
act — execute the plan
verify — check if the action worked

then loop. repeat until the goal is reached, time runs out, or something breaks.

this is the agentic loop. it’s not specific to AI. robots use it. video game NPCs use it. thermostats use it (observe temp → plan: turn on heat → act: flip switch → verify: check temp).

the difference with LLM-based agents: each step involves reasoning, not just rules. and reasoning is expensive, slow, and sometimes wrong.

why loops matter

a non-agentic AI is a single call: prompt in, response out. done.

an agentic AI keeps going. it: → tries something
→ sees if it worked
→ adjusts and tries again

this enables: → error recovery — if the first attempt fails, try a different approach
→ multi-step tasks — break a big goal into small actions
→ adaptation — respond to changes in the environment
→ persistence — keep trying until you succeed (or give up)

without loops, agents can’t do anything complex. with loops, they can. but loops also introduce failure modes.

the observe step

the agent needs to know what’s happening. this means: → reading output from the last action
→ checking state (files, APIs, databases)
→ processing sensory input (screen, logs, user feedback)

problems:

→ noisy observations
the agent runs a shell command. it gets 500 lines of output. which parts matter? if it reads everything, the context window fills up fast. if it skips too much, it misses critical info.

→ hallucinated state
the agent thinks it knows the state, but it’s wrong. “I already sent that email” (it didn’t). “the file exists” (it doesn’t). now every subsequent decision is based on false premises.

→ incomplete visibility
the agent can see some things but not others. it knows the API call succeeded (200 OK) but doesn’t know the downstream effect (database didn’t update).

good observation requires structured feedback — clear, parseable signals about success/failure, not just raw logs.

the plan step

given the current state, what should the agent do next?

this is where the LLM shines. it can reason, weigh options, adapt to context.

but planning is also where things go wrong:

→ plan too detailed
the agent generates a 10-step plan upfront. step 3 fails. now steps 4-10 are irrelevant, but the agent tries to execute them anyway.

→ plan too vague
“fix the bug” — okay, how? the agent needs concrete actions, not abstract goals.

→ plan doesn’t account for failure
“run command X, then command Y.” what if X fails? the plan doesn’t say. the agent either halts or blindly continues.

→ planning overhead
every loop iteration, the agent re-plans. if planning takes 5 seconds, and the loop runs 20 times, that’s 100 seconds of waiting.

the solution: just-in-time planning. plan one step ahead, not ten. re-plan only when the environment changes.

the act step

the agent executes the plan. this could be: → calling an API
→ running a shell command
→ writing to a file
→ sending a message
→ invoking a tool

the act step is usually the easiest part. the hard part is handling side effects.

if the action fails partway through, can you undo it? if the action succeeds but causes a downstream problem, can you detect it?

most agentic systems don’t handle this well. actions are fire-and-forget. if they break something, you won’t know until later.

good agentic design: transactions. if the action can’t complete fully, roll back. or at least, log what changed so you can manually undo it.

the verify step

did the action work? this is not always obvious.

→ false positives
command returns exit code 0 (success), but the side effect didn’t happen. the agent thinks it worked, moves on. later, everything breaks.

→ false negatives
the action worked, but the agent can’t confirm it. so it retries. now the action happens twice. (sent two emails, created two records, charged the card twice.)

→ delayed feedback
the action succeeds, but the result won’t be visible for 10 seconds. does the agent wait? poll? assume success? if it moves on too fast, it might think the action failed and retry.

verification is hard because real-world systems are messy. success is not binary. outcomes are partial, delayed, ambiguous.

loop termination

when does the loop stop?

→ goal achieved — the agent checks: “did I accomplish the task?” if yes, stop.
→ max iterations — after N loops, give up. prevents infinite loops.
→ stuck detection — if the agent tries the same action 3 times and it keeps failing, stop.
→ user interrupt — the user hits “stop” or “cancel.”

without termination conditions, agents loop forever. this burns money (API costs) and time.

but premature termination is also bad. the agent gives up before actually solving the problem.

the retry logic

when an action fails, should the agent retry?

yes, if: → the failure was transient (network timeout, rate limit)
→ retrying with backoff might succeed
→ the cost of retrying is low

no, if: → the failure is permanent (wrong API key, file doesn’t exist)
→ retrying will just fail again
→ you’re burning money on doomed attempts

smart agents detect failure types and retry selectively. dumb agents either never retry (give up too easily) or always retry (waste resources).

the context window problem

every loop iteration, the agent’s prompt grows:

system prompt (500 tokens)
+ previous observation (200 tokens)
+ previous plan (100 tokens)
+ previous action (50 tokens)
+ verification result (100 tokens)
+ new observation (200 tokens)
= 1150 tokens

after 10 iterations, you’re at 10k tokens. after 50, you hit the context limit.

solutions:

→ summarize history
after N iterations, condense the loop history into a summary. lose detail, keep high-level state.

→ external memory
store past observations in a database. the agent queries it when needed, instead of keeping everything in context.

→ stateless loops
each iteration is independent. the agent only sees the current state, not the full history. works for some tasks, not others.

the hallucination trap

LLMs hallucinate. in a loop, hallucinations compound.

iteration 1: agent hallucinates that a file exists.
iteration 2: agent tries to read the file, fails.
iteration 3: agent concludes the file is corrupted, tries to fix it.
iteration 4: agent realizes the file never existed, but now it’s confused about the state.

each loop builds on the last. if the foundation is a hallucination, every subsequent step is wrong.

the fix: ground every observation. don’t let the agent assume. check file existence. verify API responses. trust the environment, not the model’s memory.

the reflection pattern

some loops add a fifth step: reflect.

after verify, the agent asks itself: → “did my plan work?”
→ “if not, why?”
→ “what should I do differently next time?”

this is meta-reasoning. the agent evaluates its own performance and adjusts strategy.

this improves long-term behavior but costs tokens. every reflection is another LLM call.

frameworks like ReAct (Reasoning + Acting) and Reflexion bake this in. the agent explicitly reasons about its reasoning.

whether this helps depends on the task. for simple loops (run command, check output, retry), reflection is overkill. for complex, multi-step tasks, it’s essential.

the human-in-the-loop variant

fully autonomous loops are fragile. add a human.

→ approval gates
after planning, show the plan to the user. “I’m about to do X, okay?” user approves or rejects.

→ verification checkpoints
after acting, ask the user: “did this work?” user confirms or corrects.

→ error escalation
if the loop gets stuck, ping the user for help.

this slows things down but increases reliability. good for high-stakes tasks (financial, legal, medical).

the multi-agent loop

instead of one agent looping, multiple agents loop in parallel or sequence.

→ specialist agents
agent A observes, agent B plans, agent C acts, agent D verifies. each is optimized for its step.

→ supervisor + workers
supervisor agent plans, worker agents act. supervisor verifies results and re-plans.

this is the CrewAI model . adds coordination overhead, but can be more robust.

the failure modes

agentic loops can fail in many ways:

→ infinite loops — agent tries the same thing forever
→ premature termination — agent gives up too early
→ action storms — agent takes too many actions too fast, breaks things
→ context overflow — loop history fills the context window, agent forgets the goal
→ drift — each iteration, the agent’s understanding of the goal shifts slightly. by iteration 20, it’s solving the wrong problem
→ side effect cascades — one action causes a side effect, which triggers another action, which causes another side effect…

debugging these requires: → logging — track every loop iteration, every decision
→ visualization — graph the loop flow, see where it breaks
→ replay — re-run the loop from a specific iteration

most people skip this until something breaks in production. then they regret it.

the philosophical bit

agentic loops are goal-directed behavior. the agent has an objective and pursues it.

this is different from reactive systems (if X then Y) and passive tools (do what I tell you).

goal-directedness is powerful. it’s also dangerous. an agent with a misaligned goal will loop toward a bad outcome. and the more autonomous the loop, the less opportunity you have to intervene.

this is why ambient AI and fully autonomous agents are controversial. once the loop is running, you’re trusting it to not do something you’d regret.

questions worth asking

when your agent’s first action fails, does it retry intelligently, give up immediately, or loop forever?
how do you know when an agentic loop is stuck vs just taking a long time to solve a hard problem?
are you logging every loop iteration, or flying blind until something breaks?
if your agent takes 20 actions before achieving the goal, can you replay those actions to debug what went wrong at step 7?