programming languages for agents (and why AI makes you work harder, not less)

2026-02-10

░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
░                                             ░
░   before: human → language → machine        ░
░                                             ░
░   now:    human → language → agent          ░
░                       │                     ░
░                       ↓                     ░
░                   machine                   ░
░                                             ░
░   the audience changed.                     ░
░   the language must too.                    ░
░                                             ░
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

■ signal 1 — the creator of Flask thinks we need new programming languages (for agents)

strength: ■■■■■ → source

armin ronacher wrote the piece a lot of us have been thinking but haven’t articulated yet.

his thesis: programming languages were optimized for humans typing at keyboards. we prioritized brevity, DRY, implicit behavior. type inference saves keystrokes. clever abstractions reduce boilerplate. syntactic sugar makes code prettier.

but agents don’t care about keystrokes. they care about explicitness. tooling quality. low churn. readability for reviewers (who are also humans reading agent-generated code).

type inference? makes code harder to understand at a glance. implicit behavior? requires context the agent (and reviewer) might not have.

ronacher’s argument: the next generation of programming languages will optimize for a different audience. not developers at keyboards — agents consuming and generating code, with humans reviewing.

what does that look like? more explicit. more verbose. better static analysis. clearer boundaries between what’s magic and what’s mechanical.

the language wars are about to restart. but this time the primary audience isn’t developers. it’s their AI.

→ self.md take: every programming language was designed for humans writing code. we’re entering an era where agents write most of it and humans review. the tension between “easy to write” and “easy to verify” is about to get uncomfortable. whoever designs the first agent-native language that humans don’t hate — that’s the next big thing.

→ deep dive: why the next programming language will be designed for AI agents, not developers

■ signal 2 — context engineering gets its first academic framework

strength: ■■■■■ → paper | willison

someone finally wrote the theory behind what the AGENTS.md ecosystem has been doing organically.

the paper: “structured context engineering for file-native agentic systems.” the core claim: file-native context (markdown files in your repo) outperforms dynamic context injection for coding agents.

the insight: agents work better when context is durable, version-controlled, and co-located with the code. not injected at runtime. not hidden in a system prompt somewhere. right there in the repo, readable by humans and agents.

this validates the entire .md protocol layer. AGENTS.md, CLAUDE.md, .cursorrules, Backlog.md — all variations on the same pattern. the paper gives it a name and measures it.

simon willison called it out: “this is the first rigorous framework for something developers have been discovering through practice.”

the gap between “people are doing this” and “here’s why it works” just closed.

→ self.md take: context engineering is becoming a formal discipline. the same way we have design patterns, code style guides, and architecture principles — we’re about to have context patterns. how you structure AGENTS.md, what goes in it, what stays out. this paper is year zero.

■ signal 3 — the skills ecosystem is crystallizing (three repos, one pattern)

strength: ■■■■■ → openai/skills | awesome-claude-skills | compound-engineering

three repos trending simultaneously this week:

openai/skills (7.6K ★) — official Codex skills catalog. every skill is a markdown file. not JSON. not YAML. markdown.

awesome-claude-skills (33K ★) — curated Claude skills directory, maintained by Composio. 33 thousand stars. this is the largest aggregation of “what can I make Claude do” on the internet.

compound-engineering-plugin (8K ★) — Every’s official Claude Code plugin for multi-step agent workflows. the first major plugin that treats Claude Code as an orchestration platform, not just a coding assistant.

→ tool page: AI Agent Skills Catalogs

three different companies. three different approaches. one shared bet: the future of agent customization is structured skill files, not fine-tuning.

the format is .md. the distribution is GitHub. the curation is community-driven.

this is what an app store looks like before anyone builds the store.

→ self.md take: skills catalogs solve the cold start problem. instead of writing custom prompts for hours, you browse a catalog, drop a .md file in your project, and your agent gains a new capability. the risk: skill quality. a bad skill file makes your agent worse, not better. whoever builds the trusted, verified skills marketplace — with testing, ratings, compatibility guarantees — builds the App Store of the agentic era. right now it’s the wild west.

■ signal 4 — AI doesn’t reduce work. it intensifies it.

strength: ■■■■■ → source

Harvard Business Review published the piece nobody wants to hear.

authors: Aruna Ranganathan and Xingqi Maggie Ye. the thesis: AI was supposed to handle routine tasks so humans could focus on high-value work. but in practice, AI raises the baseline. “routine” disappears. everything becomes high-value.

the treadmill speeds up.

same hours. more output expected. higher cognitive load. because if your AI assistant makes you 2x faster, your boss expects you to be 2x faster. the efficiency gain gets captured by higher expectations, not reduced effort.

the productivity paradox: tools that should reduce work end up intensifying it.

one insight landed hard: “AI doesn’t automate the boring parts. it makes the boring parts go faster, so you can do more of the hard parts.”

→ self.md take: this is the burnout narrative nobody’s talking about. your AI makes you faster, so your manager raises the bar. the real question isn’t “can AI make me more productive” — it’s “who captures the productivity gain?” right now, it’s employers, not employees. the next frontier isn’t better AI. it’s boundaries. knowing when to say “I’m already working at capacity” even when the tools could theoretically do more.

■ signal 5 — stripe ships “minions” (1000+ agent-written PRs per week)

strength: ■■■■□ → source

stripe’s internal coding agents — called “minions” — now merge 1000+ pull requests per week.

not a research paper. not a demo. a production system at one of the most engineering-rigorous companies in the world.

the workflow: ticket → agent writes code → PR → human reviews → merge. one-shot. end-to-end.

the insight: agents don’t replace engineers. they change what engineers spend time on. less writing boilerplate, more reviewing decisions. less “make this button blue,” more “should this be a button at all?”

the uncomfortable part: 1000+ PRs per week means humans are reviewing that much code. stripe hasn’t published burnout metrics. but the HBR piece (signal 4) feels relevant here.

→ self.md take: the question isn’t “will AI write code” anymore. it’s “who reviews it, how much, and what happens when review becomes the bottleneck?” stripe solved the writing problem. the review problem is still open. and if AI raises output expectations without solving review load — you get the productivity paradox from signal 4. faster tools, same exhaustion.

■ signal 6 — someone gave Claude access to their psychology (and documented the experiment)

strength: ■■■□□ → source

a developer created claude_life_assistant — feeding Claude personal psychology data (values, emotional patterns, triggers) to build a truly personal AI assistant.

→ tool page: Claude Life Assistant

not a chatbot. not a productivity tool. an AI that knows your psychology. your patterns. what makes you spiral. what keeps you grounded.

documented publicly on GitHub. 655 stars.

the setup: markdown files describing psychological frameworks, personal values, emotional regulation strategies. Claude reads them. responds accordingly.

the result: an AI that doesn’t just answer questions. it understands context. “I’m feeling stuck” gets a different response based on whether you’re stuck creatively, emotionally, or tactically.

the question: is this powerful or dangerous?

→ self.md take: the “personal” in personal AI is getting uncomfortably literal. this isn’t about calendar management or email drafts. this is about modeling your internal state. the upside: an AI that actually helps instead of just executing tasks. the downside: what happens when the AI knows you better than you know yourself? and who else has access to that model? this is either the future of mental health tooling or a privacy nightmare. probably both.

■ signal 7 — someone built Starcraft for AI agents (and it actually makes sense)

strength: ■■■■□ → ralv.ai | @dom_scholz

dominik scholz shipped ralv.ai — a 3D spatial interface for orchestrating swarms of AI agents. the pitch: terminals break down at 20+ agents. drag-select agents like Starcraft units. deploy to tasks. zoom in/out between strategic view and individual agent execution.

→ tool page: Ralv.ai

the insight: gaming UX already solved the “control many units” problem. billions in R&D went into RTS interfaces. why are we managing 50 AI agents with tabs and terminal windows?

the interface: spatial canvas. agents as units. tasks as objectives. you can see the swarm. assign roles. watch execution in real-time. the metaphor is RTS commander, not sysadmin.

also has a crypto token (STARCRAFT on CryptoRank) — the economics layer for agent marketplaces. not the main story, but worth noting.

the timing is perfect. stripe ships 1000+ agent PRs/week (signal 5). skills catalogs are exploding (signal 3). we’re entering the “how do you manage this many agents?” phase.

terminals were designed for one process. maybe two with tmux. they’re the new command line — powerful for 1-3 agents, useless for 50.

→ self.md take: the agent orchestration UX problem is real. right now we manage agents like servers — ssh into them, check logs, hope nothing breaks. but agents aren’t servers. they’re swarms. and swarms need spatial interfaces, not text streams. gaming figured this out 30 years ago. someone finally asked “what if we just used game UI?” the answer: it works. the question is whether developers will accept it or keep pretending terminals scale.

░░░ meta-pattern

the theme this week: the infrastructure layer is hardening, and the consequences are arriving.

programming languages are being rethought for agents. context engineering is becoming a formal discipline. skills catalogs are emerging as the distribution layer. and now the orchestration layer is getting its first serious proposals — spatial interfaces for managing swarms.

the technical stack is forming: → language layer (agent-first programming languages)
→ context layer (file-native .md protocol)
→ skills layer (curated catalogs, not fine-tuning)
→ orchestration layer (RTS-style interfaces for agent swarms)

but three countervailing forces appeared this week:

AI intensifies work (HBR, stripe minions) — the productivity gains get captured by higher expectations, not reduced effort.
AI gets uncomfortably personal (claude_life_assistant) — the closer AI gets to modeling your internal state, the more powerful and creepy it becomes.
scale changes everything (ralv.ai) — managing 1 agent is a terminal problem. managing 50 is a spatial problem. the tools that got us here won’t get us there.

the technical problems are solvable. the human problems — burnout, boundaries, privacy, cognitive overload — are just starting.

the gap: we’re building the infrastructure for agentic systems faster than we’re figuring out how to live with them.

the opportunity isn’t better tools. it’s wisdom. knowing when to say no. knowing what not to automate. knowing where the human should stay in the loop.

nobody’s building that yet.

stay evolving