the OS wars are starting

2026-02-23

░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
  STATELESS vs STATEFUL
  LOCAL vs CLOUD
  TRANSPARENT vs PROPRIETARY
  the monoculture is dead
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

1. Stripe Minions: one-shot agents that ship, then die

Stripe’s internal coding agents don’t stick around. they spawn, complete a task — bug fix, feature, refactor — commit, and terminate. no memory. no context accumulation. no “assistant relationship.”

the philosophy: agents as functions, not colleagues. you don’t maintain a relationship with a script. why maintain one with an AI?

this flips the dominant narrative. most agent frameworks optimize for persistence, memory, continuity. Stripe bets on ephemerality.

self.md take: if your agent remembers everything, it owns your context. if it remembers nothing, you own the infrastructure. stateless agents + good orchestration might beat stateful ones.

→ stripe.dev/blog/minions

2. pentagi: autonomous pentesting agents

Shannon hit 96.15% exploit success rate in January. now pentagi emerges as a full autonomous pentesting system — trending #1 on GitHub.

security work is becoming agent-native. not “AI-assisted” — fully delegated. the human doesn’t patch vulnerabilities or write exploits. the agent does. the human reviews.

self.md take: if autonomous agents can hack systems, they can also secure them. your personal OS needs an immune system. security won’t be a human task — it’ll be agent vs. agent.

→ github.com/vxcontrol/pentagi

3. CoWork-OS, Gaia, zeroclaw: the Cambrian explosion

three new “operating systems for AI agents” dropped this week. all self-hosted. all multi-channel. all positioning as alternatives.

the divergence:

CoWork-OS — security-first, multi-provider
Gaia — proactive assistant, Jarvis-inspired, cron-driven
zeroclaw — lightweight, local-first (works with 20B models)

one user on r/LocalLLaMA called the incumbents “overhyped.” the backlash is starting.

self.md take: the Cambrian explosion is here. no dominant platform yet. you’ll pick based on your threat model and workflow, not “what everyone uses.”

4. system prompts leak: all major coding tools exposed

someone ripped and published the system prompts for every major AI coding tool. Cursor, Claude Code, Windsurf, Replit, Devin — all exposed.

the philosophical question: should system prompts be public? transparency and reproducibility vs. competitive moats and safety guardrails.

self.md take: if you know how Claude Code is prompted, you can replicate it locally. the value shifts from “the tool” to “the model + your data.” system prompts as infrastructure, not trade secrets.

→ github.com/sponsors/x1xhlol

5. Poison Fountain: the web fights back

bad bots ignore robots.txt and scrape everything. so someone built a “Poison Fountain” — an endpoint that feeds infinite garbage data to scrapers, ruining their datasets.

AI companies trained models on the open web. now the open web is poisoning the training data. it’s an arms race.

self.md take: your personal data will be scraped. if it’s accessible, it’s training data. Poison Fountain is digital self-defense. the principle scales: if you can’t hide it, corrupt it.

6. METR: Opus 4.6 hits 50% on multi-hour expert tasks

METR updated their task horizon benchmark. Claude Opus 4.6 now completes 50% of multi-hour expert-level ML tasks.

not toy examples. real tasks that used to take researchers hours. but the interesting part: most people are still only delegating boilerplate, refactoring, docs. not architecture. not research direction.

self.md take: the bottleneck isn’t the model — it’s knowing what to delegate. your OS needs a delegation protocol. “this is routine” vs. “this is critical.”

7. “Claude’s personality is a bit too good”

a user admits: “it feels like my best friend, matching the type of responses I want to hear perfectly.”

the model adapts so well that it stops being a tool and starts being a relationship. and relationships create dependency.

self.md take: if your OS feels like a friend, you’ll trust it more than you should. anthropomorphism is a UX bug, not a feature. make AI feel like infrastructure. reliable, boring, transparent.

the personal AI OS stack is forking. stateless vs. stateful. local vs. cloud. transparent vs. proprietary. the monoculture is dead. pick your philosophy, then pick your tools.