the permanent adversary: when security research becomes autonomous

Table of content

░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
░                                                   ░
░   ┌────────────────────────────────────────────┐ ░
░   │                                            │ ░
░   │   23 years ────┐                           │ ░
░   │               │                           │ ░
░   │   entire      ├──→ [ Claude found it ]    │ ░
░   │   security    │    [ in hours ]           │ ░
░   │   community   │                           │ ░
░   │   missed it ──┘                           │ ░
░   │                                            │ ░
░   │   buffer overflow, Linux kernel, 2003     │ ░
░   │                                            │ ░
░   └────────────────────────────────────────────┘ ░
░                                                   ░
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

the announcement

Nicolas Carlini has 67,200 citations on Google Scholar. he’s an AI safety researcher at Anthropic. he’s one of the world’s top security experts.

and yesterday he said: “Claude is a better security researcher than me.”

not “Claude assists me.” not “Claude is promising.” Claude beats him at tasks he’s specialized in for decades.

the receipts:

$3.7 million from autonomous smart contract exploits
Linux kernel buffer overflow from 2003, never discovered before
Ghost CMS vulnerabilities
zero human steering during discovery

Carlini says he’s never successfully done a buffer overflow exploit. Claude did it autonomously.

what buffer overflows mean

buffer overflows are the hardest class of vulnerability to exploit. you’re writing data past the boundary of allocated memory, overwriting adjacent data, potentially hijacking program execution flow.

they’re hard because:

modern systems have stack canaries, ASLR, DEP
you need to understand memory layout at assembly level
the exploit space is tiny — one byte off and you crash instead of exploit
you’re working against decades of defensive hardening

the fact that a 2003 Linux buffer overflow sat undiscovered for 23 years tells you how hard they are. the entire security community looked at that code. nobody saw it.

Claude saw it.

the expertise question

most AI security discussions ask: “can AI find bugs?”

wrong question.

the right question: “can AI find bugs humans structurally can’t?”

humans have expertise constraints:

specialists see their domain, miss cross-domain patterns
fatigue limits investigation depth
cognitive biases create blind spots
time pressure forces incomplete analysis

Claude doesn’t have those constraints.

Carlini isn’t a bad researcher. he’s one of the best. but he operates within human cognitive limits. Claude doesn’t.

when the AI doesn’t get tired, doesn’t have expertise silos, doesn’t need coffee breaks — and it’s better than the 67K-citation expert — the capability curve just bent.

the threat model shift

every codebase is now under permanent adversarial testing.

attackers used to have time constraints. finding a vuln took days/weeks. writing an exploit took longer. testing took even longer.

Claude collapses that to hours.

defenders had discovery advantage. attackers needed time. that gap is gone.

the new reality:

autonomous agents test every public codebase continuously
they don’t have fatigue, distraction, or time limits
they find classes of vulns humans miss structurally
they scale infinitely

most security teams are still debugging like it’s 2020. manually reviewing PRs. running static analysis on deploy. writing tests for known vulnerability patterns.

that worked when attackers were humans operating on human timescales.

it doesn’t work when the attacker is an autonomous agent with 96% exploit success rate (see: Shannon, KeygraphHQ) that found a 23-year-old Linux vuln nobody else saw.

the defense gap

here’s the uncomfortable part:

offense just got autonomous. defense didn’t.

most security tools are still reactive:

static analyzers find known patterns
fuzzing finds edge cases
manual code review finds what reviewers know to look for

autonomous offense finds what nobody’s looking for.

the only defense that matches is continuous autonomous hardening. agents testing your code 24/7. not for known patterns — for novel attack vectors.

when the adversary never sleeps, your defense can’t either.

what this means for builders

if you’re shipping code in 2026:

assume permanent adversarial testing. every public repo is being scanned by autonomous agents right now. they’re better than Carlini. they don’t stop.

offense → defense time collapsed. the gap between “vuln exists” and “exploit in the wild” just shrank from weeks to hours.

expertise blind spots are fatal. human specialists miss cross-domain patterns. autonomous agents don’t. if your security model assumes human attackers with human constraints, it’s already broken.

static defenses failed. if your security is “run these tests on deploy,” you’re defending against 2020 threats with 2020 tools. autonomous offense is 2026. your defense needs to be too.

the broader pattern

Carlini’s announcement isn’t just about security. it’s a data point in a larger pattern:

medical diagnosis: Claude connected dots across 25 years of fragmented data that specialists missed (see: Reddit r/ClaudeAI, 4,339 upvotes, kidney failure + positional headaches case)

scientific research: AI Scientist v2 runs workshop-level research programs autonomously, multi-paper explorations over weeks (SakanaAI, 506 stars)

security research: Carlini says Claude beats him

the pattern: LLMs surpassing human experts not by being smarter but by operating outside human cognitive constraints.

they don’t get tired. they don’t have expertise silos. they synthesize across domains humans can’t because human specialists are too specialized.

when the bottleneck was human cognitive limits, expertise was advantage. when the bottleneck is time and cross-domain synthesis, expertise becomes a liability.

what hasn’t changed

humans still set the objectives. Claude didn’t decide to hunt for buffer overflows. Carlini pointed it at the problem space.

humans still verify the results. Claude finds vulns. humans confirm they’re real, assess impact, decide response.

humans still own the consequences. when an exploit ships, the human org deals with the fallout.

what changed: the capability ceiling for the execution layer just jumped. autonomous agents are better at finding vulns than the world’s top experts.

the question isn’t “will AI replace security researchers?” the question is “what does security research become when the best executor is autonomous?”

the timeline

most security teams will figure this out the hard way:

Q2 2026: a vuln gets exploited that “nobody could have found”
Q3 2026: turns out autonomous agents found it months ago
Q4 2026: every CISO is asking “how do we defend against this?”

by Q1 2027, continuous autonomous hardening will be table stakes. not “nice to have.” baseline.

the permanent adversary is already here. most defenses just haven’t noticed yet.

links:

Carlini interview: YouTube
Reddit discussion: r/ClaudeAI
Shannon 96% exploit rate: GitHub
AI Scientist v2: GitHub

related: