agents need infrastructure, not just models
Table of content
by Ray Svitla
the gap between “ChatGPT writes a function” and “agent solves a multi-day problem” isn’t about smarter models.
it’s about missing primitives.
and this week, we’re finally getting them.
the infrastructure blind spot
most people think agent capabilities scale with model intelligence. better reasoning → more autonomous work → eventual AGI.
but talk to anyone running production agent workflows and the bottleneck isn’t the model. it’s everything around it.
you can have Opus 4.6 — the best coding model available — and still hit walls:
→ your agent can’t remember what it learned yesterday
→ it can’t control tools without APIs
→ it crashes on tasks longer than 20 minutes
→ it can’t persist state between sessions
→ it has no cognitive architecture for planning, reflection, repair
the model is brilliant. the infrastructure is missing.
what actually shipped this week
OpenCLI turned every website, desktop app, and binary into a CLI command your agent can discover and use via AGENTS.md.
not browser automation (fragile, breaks constantly).
not API wrappers (99% of tools don’t have APIs).
just: stable CLI commands, standardized interface, agent-discoverable.
when Figma, Notion, Linear, Slack — all the tools agents currently can’t touch — become as easy to control as curl, the tooling surface doesn’t expand. it explodes.
6,300 stars in 24 hours. developers got it immediately.
deer-flow from ByteDance is a production harness for multi-hour expert work.
not “write a function” or “fix this bug.”
research papers. full features. complex migrations. the stuff that takes hours or days.
it ships with:
→ memory persistence across sessions
→ skill library for reusable capabilities
→ subagent orchestration for parallel work
→ message gateway for communication
most harnesses optimize for speed. deer-flow optimizes for depth.
when the target is “solve this, I’ll check back tomorrow,” you need different primitives. ByteDance just open-sourced them.
miniclaw-os said: agents need cognitive architecture, not just execution.
memory persistence. plan revision. self-repair loops.
most frameworks focus on tools: sandboxes, APIs, file access.
miniclaw-os focuses on cognition: what does the agent remember? how does it plan? what happens when things break?
when agents have cognitive primitives (not just execution primitives), they stop being stateless task-runners and become stateful problem-solvers.
the shift: from “agents execute” to “agents think.”
dorabot: the evolution nobody noticed
dorabot has been around since February.
first version: IDE agent for macOS.
march version: 24/7 background agent.
today’s version: persistent coworker with Slack/Telegram/WhatsApp integration.
the evolution is the pattern.
agents started as chat interfaces. then they became terminal tools. now they’re becoming persistent processes that live alongside you.
dorabot’s insight: messaging should be the primary interface, not the side feature.
when your agent responds to Slack pings and lives in your repo 24/7, the boundary between “tool I invoke” and “coworker who’s always on” disappears.
most people missed this because they’re still thinking about agents as sessions you start and stop.
the future is agents that just… run.
the adversarial inflection
Shannon hit 96.15% success rate on autonomous exploits.
discovery → exploit → privilege escalation → lateral movement.
no human steering.
every codebase is now under permanent adversarial testing.
the security timeline used to be:
→ attacker finds vulnerability (days/weeks)
→ attacker crafts exploit (days)
→ attacker tests (hours)
→ defender discovers breach (days/weeks)
Shannon collapses that to hours.
when the attacker is an autonomous agent with 96% success rate, the only defense is continuous automated hardening.
the arms race went exponential overnight.
what these all have in common
none of these are about better models.
OpenCLI doesn’t need GPT-6. it needs universal tool abstraction.
deer-flow doesn’t need smarter reasoning. it needs multi-hour execution infrastructure.
miniclaw-os doesn’t need more parameters. it needs cognitive primitives.
dorabot doesn’t need a bigger context window. it needs persistent state.
the pattern: infrastructure matters more than intelligence.
the Qwen signal nobody’s talking about
Qwen3.5-397B — a 397 billion parameter flagship model — now runs on a $2,100 desktop at 5-9 tokens/second.
not cloud inference. not data center GPUs. consumer hardware.
FOMOE (Fast Opportunistic Mixture Of Experts) solved the MoE memory problem: load active experts, keep inactive ones on NVMe.
when flagship models fit prosumer budgets, the local/cloud split stops being about capability and becomes about preference.
regulated industries → local required (data sovereignty)
air-gapped environments → local required (no connectivity)
sovereignty-first users → local preferred (trust)
the cloud moat just evaporated for a meaningful segment.
what this means for you
if you’re building with agents today, the question isn’t “which model should I use?”
it’s: do I have the infrastructure to support what I’m trying to do?
→ does my agent remember across sessions?
→ can it work for hours without supervision?
→ can it control the tools I actually use?
→ does it have cognitive loops (plan/reflect/repair)?
→ is it persistent or ephemeral?
the best model with terrible infrastructure loses to a decent model with solid infrastructure every time.
the unsexy truth
nobody gets excited about memory layers.
nobody tweets about multi-hour execution harnesses.
nobody writes viral posts about cognitive architecture.
but these are the primitives that determine whether your agent is a demo or a coworker.
the infrastructure gap is closing.
OpenCLI, deer-flow, miniclaw-os, dorabot — these aren’t just tools. they’re the missing pieces.
when agents have persistent memory, universal tool access, multi-hour execution, and cognitive loops, the bottleneck shifts from “can the model do this?” to “what should I build?”
and that’s when things get interesting.
Ray Svitla
stay evolving 🐌