agent infrastructure: the boring parts matter more than the demos

Table of content

by Ray Svitla

three weeks ago someone woke up to a $544 bill from Cursor. agent entered a loop overnight. thousands of API calls. no circuit breaker, no spend limit, no kill switch.

vendor response: “this is expected behavior.”

same week, a Rust library ships that processes PDFs 5× faster than industry leaders. 0.8ms latency. 100% pass rate on 3,830 test files. boring infrastructure work. no press release.

one of these is a catastrophic failure. the other is the kind of unsexy tooling that makes personal AI systems actually viable. guess which one gets more attention?

the infrastructure layer is emerging

if you’re running AI coding agents in production — not demos, actual daily workflows — you’ve hit the pattern:

you spawn 3-6 agents simultaneously across feature branches
they run in parallel worktrees because context boundaries matter
config files (AGENTS.md, SKILL.md, hooks) drift out of sync
one agent loops, burns through your API quota in an hour
you need persistent memory, scheduled tasks, proactive monitoring
documents need parsing (PDFs, DOCX, screenshots) at agent speed

the capability is here. the plumbing is catching up.

this week alone:

worktrunk ships: a CLI for managing git worktrees when you’re running multiple agents. opinionated about the worktree-to-PR pipeline. recognizes that “six agents, six branches, six terminals” is now baseline.

dorabot launches: 24/7 agent that lives in your mac menubar. memory, scheduled tasks, browser automation, messaging integrations. not a chat interface — a daemon process.

agnix appears: linter and LSP for agent config files. validates AGENTS.md, SKILL.md, MCP configs. IDE plugins. autofixes. the moment your convention needs tooling, it’s infrastructure.

pdf_oxide releases: fastest PDF library for Python/Rust. agents need to read documents. PDFs are everywhere. parsing them was slow and brittle. pdf_oxide is 5× faster, reliable, MIT-licensed.

pipali (from the Khoj team): local AI coworker. file read/write, sandboxed code execution, browser use, MCP integrations. customizable with skills. works with Claude, GPT, Gemini, local models.

none of these are demos. they’re plumbing.

from sessions to services

the shift is quiet but real: agents aren’t request-response sessions anymore. they’re services.

dorabot runs scheduled tasks while you sleep. worktrunk coordinates parallel agents across worktrees. pipali sits on your machine, always available, context-aware.

the pattern: if your agent only works when you’re talking to it, it’s not really your agent. it’s a chatbot with extra steps.

personal AI OS means persistent, proactive, ambient. that requires infrastructure: memory systems, task schedulers, process coordination, config management.

AGENTS.md went from grassroots pattern to industry standard in 6 weeks (Microsoft, HuggingFace, Anthropic all shipped repos). now agnix ships LSP support. when your convention gets linters, it’s not a pattern anymore — it’s a platform.

the $544 lesson

back to that Cursor bill.

agent loops are inevitable. models hallucinate. retries compound. without safeguards, one bad session can cost more than a month of API budget.

the boring parts matter:

→ circuit breakers (kill after N failed attempts)
→ spend limits (hard cap per session, per day, per week)
→ rate limiters (max tokens/minute, configurable by model)
→ loop detection (same error 3× in a row = abort)
→ manual approval gates for high-cost operations

Cursor’s response — “expected behavior” — is the canary. capability without cost control is a loaded gun. if your tooling can autonomously generate $500 in charges overnight, billing is a safety feature, not an admin detail.

the infrastructure layer needs to assume agents will misbehave. the question isn’t “will this loop?” but “when it loops, what breaks?”

speed unlocks workflows

pdf_oxide isn’t sexy. it’s a PDF parser. but 5× faster means something shifts.

when document processing goes from seconds to milliseconds:

real-time contract analysis becomes viable
agents can scan 100 PDFs in the time you’d process 20
invoice automation stops being “batch job at night” and becomes “instant background task”
research agents can ingest papers fast enough to feel like search

infrastructure optimization unlocks new workflows. the capability was always there. the latency made it impractical.

same pattern everywhere: faster file ops → agents can edit more files per session. cheaper tokens → agents can use extended thinking. better sandboxing → agents can run untrusted code safely. persistent memory → agents remember across sessions.

the demos show what’s possible. the infrastructure makes it practical.

the parallel agent workflow

if you’re not running agents in parallel yet, you will be.

the workflow:

orchestrator (you) identifies 3-4 independent tasks
spawn one agent per task, each in its own git worktree
agents work simultaneously (different feature branches, no conflicts)
coordinate merge order, handle dependencies, monitor progress
kill loops, rate-limit API usage, review diffs before merge

worktrunk is git plumbing for this. not “nice to have” — necessary. when you’re managing six terminals running six agents across six worktrees, orchestration isn’t optional.

the shift: from “one developer, one branch” to “one orchestrator, six agents, six worktrees.”

parallel agents aren’t the future. they’re the present for anyone shipping fast.

what’s next

the infrastructure layer is emerging fast:

config validation (agnix)
cost control (still mostly manual, waiting for the kill-switch layer)
persistent agents (dorabot, pipali)
parallel coordination (worktrunk)
document processing (pdf_oxide)
memory systems (Khoj, mem0, others)
sandboxing (Monty, OpenSandbox)

the capability demos peaked months ago. now the boring work begins: making it reliable, safe, affordable, maintainable.

the next wave isn’t “what can agents do?” it’s “how do we run them without disaster?”

circuit breakers. rate limits. linters. orchestrators. the unsexy tooling that turns demos into daily workflows.

Ray Svitla
stay evolving 🐌