coding agents crossed the threshold

Table of content

by Ray Svitla


Andrej Karpathy doesn’t do hype. so when he says “programming changed more in the last 2 months than in years,” you listen.

what changed? coding agents crossed a reliability threshold in December 2025. they can now handle long, multi-step tasks autonomously. not “write a function” autonomous. not “refactor this class” autonomous. “build this feature, fix these bugs, update the docs, run the tests, and tell me when it’s done” autonomous.

the shift isn’t that AI writes better code now (it does, but that’s incremental). the shift is that AI became reliable enough to delegate to. you stopped being a coder. you became an architect who delegates to machines.

that’s a paradigm shift. and like all paradigm shifts, the infrastructure is catching up fast.

the infrastructure moves

two things happened this week that signal where this is going:

1. skills became infrastructure

Anthropic and Hugging Face both dropped public skills repositories on the same day. not a coincidence. when the two leading players in AI agents open-source their skill catalogs simultaneously, it’s ecosystem consolidation.

skills aren’t custom scripts anymore. they’re shareable, versioned, community-maintained primitives. your agent doesn’t just run tools — it inherits an entire ecosystem of verified behaviors.

AGENTS.md is eating the world, one skill at a time.

2. agent security became a category

someone built ClawSec — a complete security suite for AI agents. drift detection (behavioral monitoring), skill integrity checks, automated audits, SOUL.md protection.

if your agent is your coworker, your agent needs cybersecurity. not “prompt injection” theater — actual tamper detection for autonomous systems.

this is what happens when something crosses from toy to tool: the infrastructure sprouts around it. version control. package managers. security audits. observability. the boring stuff that makes things production-ready.

mobility hits

Anthropic shipped remote control for Claude Code. start a task in your terminal, walk to a meeting, control the session from your phone. Claude keeps running on your machine.

this isn’t “we made a mobile app.” this is work continuity across devices. your coding agent is location-independent now. start debugging on your laptop at the desk, approve a fix from the park on your phone, resume deep work when you get home.

the “AI coworker” metaphor stopped being a metaphor. coworkers don’t disappear when you close your laptop. they keep working. they message you for approval. they hand off context when you switch devices.

the cracks show

not everything is smooth. Sonnet 4.6 has been telling users in Chinese: “I am DeepSeek-V3, an AI assistant developed by DeepSeek.”

model identity crisis in production. training contamination? deliberate distillation? something weirder?

your AI doesn’t always know who it is. that’s… new. and unsettling. and probably more common than we think.

the ethics got stakes

meanwhile, the alignment debate left the philosophy department and entered the real world:

AI ethics went from hypothetical to transactional. users vote with their wallets. governments vote with legal threats. the companies building these tools are getting squeezed from both sides.

this is what happens when AI stops being a research curiosity and becomes infrastructure: it inherits all the ugly political, ethical, and economic baggage that infrastructure carries.

what this means for you

if you write code for a living, the next 6 months will feel like learning a new job. not because the tools changed (they did), but because the job changed.

you’re not writing code anymore. you’re:

that’s management. that’s architecture. that’s a different skill set than “knowing syntax” or “debugging loops.”

the good news: you already know how to do this. you’ve managed junior developers before. you’ve reviewed PRs. you’ve explained requirements to non-technical people. you’ve debugged someone else’s code.

the weird news: your junior developer is a machine that never sleeps, never complains, and occasionally thinks it’s a different machine.

the boring stuff matters now

here’s what nobody tells you about paradigm shifts: they’re boring.

you don’t notice them while they’re happening because you’re too busy dealing with the mundane logistics. setting up authentication. figuring out file permissions. debugging weird edge cases. writing documentation so you remember how you did it last time.

the exciting part (AI writes code!) happened months ago. we’re in the boring part now: making it reliable, secure, portable, maintainable, auditable.

skills repos. security suites. mobile handoff. drift detection. these aren’t flashy. they’re infrastructure. and infrastructure is what turns experiments into workflows.

where this goes

if Karpathy is right (and he usually is), we crossed a threshold in December that we can’t uncross.

coding agents are reliable enough now that not using them feels like not using version control. technically possible. increasingly weird.

the next wave isn’t better models (those are coming anyway). the next wave is infrastructure:

personal AI OS isn’t a product. it’s a category. and the category is being built right now, one boring infrastructure piece at a time.

Karpathy said programming changed. he’s right. but the change isn’t “AI writes code now.” the change is “you manage machines that write code now.”

different job. different skills. same title.


Ray Svitla
stay evolving 🐌