the permanent adversary: when security research becomes autonomous
Table of content
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
░ ░
░ ┌────────────────────────────────────────────┐ ░
░ │ │ ░
░ │ 23 years ────┐ │ ░
░ │ │ │ ░
░ │ entire ├──→ [ Claude found it ] │ ░
░ │ security │ [ in hours ] │ ░
░ │ community │ │ ░
░ │ missed it ──┘ │ ░
░ │ │ ░
░ │ buffer overflow, Linux kernel, 2003 │ ░
░ │ │ ░
░ └────────────────────────────────────────────┘ ░
░ ░
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
the announcement
Nicolas Carlini has 67,200 citations on Google Scholar. he’s an AI safety researcher at Anthropic. he’s one of the world’s top security experts.
and yesterday he said: “Claude is a better security researcher than me.”
not “Claude assists me.” not “Claude is promising.” Claude beats him at tasks he’s specialized in for decades.
the receipts:
- $3.7 million from autonomous smart contract exploits
- Linux kernel buffer overflow from 2003, never discovered before
- Ghost CMS vulnerabilities
- zero human steering during discovery
Carlini says he’s never successfully done a buffer overflow exploit. Claude did it autonomously.
what buffer overflows mean
buffer overflows are the hardest class of vulnerability to exploit. you’re writing data past the boundary of allocated memory, overwriting adjacent data, potentially hijacking program execution flow.
they’re hard because:
- modern systems have stack canaries, ASLR, DEP
- you need to understand memory layout at assembly level
- the exploit space is tiny — one byte off and you crash instead of exploit
- you’re working against decades of defensive hardening
the fact that a 2003 Linux buffer overflow sat undiscovered for 23 years tells you how hard they are. the entire security community looked at that code. nobody saw it.
Claude saw it.
the expertise question
most AI security discussions ask: “can AI find bugs?”
wrong question.
the right question: “can AI find bugs humans structurally can’t?”
humans have expertise constraints:
- specialists see their domain, miss cross-domain patterns
- fatigue limits investigation depth
- cognitive biases create blind spots
- time pressure forces incomplete analysis
Claude doesn’t have those constraints.
Carlini isn’t a bad researcher. he’s one of the best. but he operates within human cognitive limits. Claude doesn’t.
when the AI doesn’t get tired, doesn’t have expertise silos, doesn’t need coffee breaks — and it’s better than the 67K-citation expert — the capability curve just bent.
the threat model shift
every codebase is now under permanent adversarial testing.
attackers used to have time constraints. finding a vuln took days/weeks. writing an exploit took longer. testing took even longer.
Claude collapses that to hours.
defenders had discovery advantage. attackers needed time. that gap is gone.
the new reality:
- autonomous agents test every public codebase continuously
- they don’t have fatigue, distraction, or time limits
- they find classes of vulns humans miss structurally
- they scale infinitely
most security teams are still debugging like it’s 2020. manually reviewing PRs. running static analysis on deploy. writing tests for known vulnerability patterns.
that worked when attackers were humans operating on human timescales.
it doesn’t work when the attacker is an autonomous agent with 96% exploit success rate (see: Shannon, KeygraphHQ) that found a 23-year-old Linux vuln nobody else saw.
the defense gap
here’s the uncomfortable part:
offense just got autonomous. defense didn’t.
most security tools are still reactive:
- static analyzers find known patterns
- fuzzing finds edge cases
- manual code review finds what reviewers know to look for
autonomous offense finds what nobody’s looking for.
the only defense that matches is continuous autonomous hardening. agents testing your code 24/7. not for known patterns — for novel attack vectors.
when the adversary never sleeps, your defense can’t either.
what this means for builders
if you’re shipping code in 2026:
assume permanent adversarial testing. every public repo is being scanned by autonomous agents right now. they’re better than Carlini. they don’t stop.
offense → defense time collapsed. the gap between “vuln exists” and “exploit in the wild” just shrank from weeks to hours.
expertise blind spots are fatal. human specialists miss cross-domain patterns. autonomous agents don’t. if your security model assumes human attackers with human constraints, it’s already broken.
static defenses failed. if your security is “run these tests on deploy,” you’re defending against 2020 threats with 2020 tools. autonomous offense is 2026. your defense needs to be too.
the broader pattern
Carlini’s announcement isn’t just about security. it’s a data point in a larger pattern:
medical diagnosis: Claude connected dots across 25 years of fragmented data that specialists missed (see: Reddit r/ClaudeAI, 4,339 upvotes, kidney failure + positional headaches case)
scientific research: AI Scientist v2 runs workshop-level research programs autonomously, multi-paper explorations over weeks (SakanaAI, 506 stars)
security research: Carlini says Claude beats him
the pattern: LLMs surpassing human experts not by being smarter but by operating outside human cognitive constraints.
they don’t get tired. they don’t have expertise silos. they synthesize across domains humans can’t because human specialists are too specialized.
when the bottleneck was human cognitive limits, expertise was advantage. when the bottleneck is time and cross-domain synthesis, expertise becomes a liability.
what hasn’t changed
humans still set the objectives. Claude didn’t decide to hunt for buffer overflows. Carlini pointed it at the problem space.
humans still verify the results. Claude finds vulns. humans confirm they’re real, assess impact, decide response.
humans still own the consequences. when an exploit ships, the human org deals with the fallout.
what changed: the capability ceiling for the execution layer just jumped. autonomous agents are better at finding vulns than the world’s top experts.
the question isn’t “will AI replace security researchers?” the question is “what does security research become when the best executor is autonomous?”
the timeline
most security teams will figure this out the hard way:
- Q2 2026: a vuln gets exploited that “nobody could have found”
- Q3 2026: turns out autonomous agents found it months ago
- Q4 2026: every CISO is asking “how do we defend against this?”
by Q1 2027, continuous autonomous hardening will be table stakes. not “nice to have.” baseline.
the permanent adversary is already here. most defenses just haven’t noticed yet.
links:
- Carlini interview: YouTube
- Reddit discussion: r/ClaudeAI
- Shannon 96% exploit rate: GitHub
- AI Scientist v2: GitHub
related: