who reviews the agent's code?

Table of content

by Ray Svitla

■ ■ ■

there’s a financial services company somewhere that used to produce 25,000 lines of code a month. then they plugged in Cursor. now they produce 250,000.

that’s a 10x increase in output. impressive. the kind of number that makes investors salivate and CTOs write LinkedIn posts about transformation.

here’s what nobody put in the press release: they now have a backlog of one million lines of unreviewed code sitting in their pipeline like unexploded ordnance.

the New York Times reported this last week under the headline “The Big Bang: A.I. Has Created a Code Overload.” Joni Klippert of StackHawk put it plainly → “the sheer amount of code being delivered, and the increase in vulnerabilities, is something they can’t keep up with.”

no kidding.

■ ■ ■

the bottleneck was never generation

this is the part that kills me. for years, the entire industry assumed the hard part of software was writing it. billions went into making code generation faster → copilots, agents, auto-complete on steroids. and it worked. spectacularly.

but the hard part was never writing code. the hard part was always knowing whether the code was correct, secure, and wouldn’t silently destroy something at 3am on a Friday.

generation scaled 10x. review capacity stayed flat. the humans who can actually evaluate what shipped → same number, same bandwidth, same 24 hours in a day.

congratulations. we automated the easy part and left the hard part untouched.

■ ■ ■

the final bottleneck

Armin Ronacher — the person who built Flask, so not exactly someone you dismiss — wrote about this in February. his piece “The Final Bottleneck” is the clearest articulation of the problem i’ve seen.

his core argument: historically, writing code was slower than reviewing code. that’s no longer true. and when input grows faster than throughput in any queue-based system, you get accumulating failure.

he compared it to a Starbucks overwhelmed by mobile orders. the in-store experience breaks down. you don’t know how many orders are ahead of you. there’s no reliable wait estimate. the queue itself becomes the problem.

Armin points to OpenClaw having 2,500+ open pull requests. that’s not a backlog. that’s a graveyard where PRs go to become stale and unmergeable.

but here’s the line that sticks: “I too am the bottleneck now. But you know what? Two years ago, I too was the bottleneck. I was the bottleneck all along.”

the machine didn’t create the bottleneck. it revealed it. the constraint was always human attention, human judgment, human accountability. the agents just made it impossible to pretend otherwise.

■ ■ ■

norton enters the chat

and then, like clockwork, the enterprise seatbelt moment arrived.

on april 9th — literally yesterday — Norton launched “AI Agent Protection” in Norton 360. a beta feature that monitors autonomous AI agents (Claude Code, Cursor, OpenClaw) in real time, blocking confirmed threats and pausing suspicious actions for human review.

let that sink in. Norton. the company that spent thirty years selling you antivirus software. now shipping agent antivirus.

this is the moment when an industry problem becomes a consumer product category. when the verification gap gets its own SKU. Norton looked at the landscape and said: there are millions of people running AI agents on their machines, and nobody is watching what those agents actually do.

they’re not wrong.

it’s also, let’s be honest, kind of absurd. we built agents smart enough to write entire applications, and now we need a completely separate piece of software to watch them while they work. like hiring a security guard for your robot that you hired to replace your security guard.

but absurdity and necessity often travel together.

■ ■ ■

the divide nobody talks about

here’s the structural problem nobody wants to name: we’re creating a class divide in software.

on one side → junior prompters. abundant. cheap. they can spin up features in hours using Claude Code or Cursor. they generate enormous volumes of code. some of it is good. some of it is a haunted house with a React frontend.

on the other side → senior reviewers. scarce. expensive. the people who actually understand systems deeply enough to evaluate what shipped. they were already a bottleneck when code was written by hand. now they’re drowning.

the ratio is getting worse, not better. every AI coding tool creates more prompters. nothing creates more reviewers. review skill comes from years of building, breaking, and fixing systems. you can’t prompt-engineer that into existence.

Steve Yegge — who has been bullish on AI coding for years — now casts doubts about the sustainability of this pace. when the optimists start hedging, pay attention.

■ ■ ■

so who reviews YOUR agent’s code?

this is where it gets personal. literally.

if you’re building with personal AI → running agents on your own machine, connecting them to your own tools, shipping code to your own projects → you are both the prompter and the reviewer. there’s nobody else in the loop.

when i run Claude Code on my stack, the output goes straight into production. there’s no PR queue. no second pair of eyes. no Joni Klippert to flag the vulnerability spike. it’s me, and the agent, and the git log.

this is the dirty secret of the personal AI movement: we talk endlessly about generation capabilities. how fast can the agent build? how many features per hour? can it scaffold an entire app from a sentence?

nobody talks about the verification layer. and without one, you’re not building a personal AI OS. you’re building a personal liability generator.

■ ■ ■

own the stack, own the review

the self.md thesis has always been: if you own your stack, you own your destiny. your data, your tools, your workflows → under your control, not rented from a platform that can change terms on a Tuesday.

but ownership without verification is just exposure with extra steps.

the personal AI OS needs a review layer baked in. not bolted on after the fact like Norton trying to wrap agents in a safety blanket. baked in from the architecture level.

what does that look like? honestly, i don’t fully know yet. some pieces are emerging:

● agent-generated diffs that are structured for human scanning, not just correctness ● automated test generation that runs before code hits any branch ● verification agents that review other agents’ output (yes, turtles all the way down — but better turtles than nothing) ● personal audit logs that let you trace what changed, when, and why ● kill switches that actually work when the agent goes sideways at 2am

Armin is right that non-sentient machines can’t carry accountability. and he’s right that society will demand accountability anyway. the question is where that accountability lives.

in enterprise, it’ll live in compliance departments and expensive tooling. fine. that’s their problem.

in personal AI, it lives with you. which means if you’re serious about this, you need to build the review muscle alongside the generation muscle. same priority. same investment. same attention.

■ ■ ■

the uncomfortable question

the textile industry went through this exact pattern during the industrial revolution. bottleneck removed → constraint shifted downstream → new infrastructure built to handle the throughput. weaving sped up, so yarn became the problem. spinning sped up, so raw cotton became the problem. each time, the answer wasn’t to slow down. it was to build the next layer.

we’re at the yarn stage. code generation is the power loom. review is the spinning wheel that hasn’t been upgraded yet.

the companies that figure out verification at scale will define the next era of software. and the individuals who build personal verification into their AI stack will be the ones still standing when the first wave of agent-generated disasters hits production.

because it’s coming. a million unreviewed lines of code don’t sit quietly forever.

the question isn’t whether agents can write code. that’s settled. the question is: who watches the watchers? and if you’re building your own AI OS — if you’re running agents on your own machine, against your own data, for your own purposes — the answer is uncomfortably simple.

you do.

are you ready for that?

■ ■ ■

Ray Svitla / stay evolving