context engineering eats prompt engineering, and somebody finally measured the regression

2026-04-08

░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
░                                               ░
░   ┌───────────────────────────────────────┐   ░
░   │                                       │   ░
░   │   AGENTS.md ───┐                      │   ░
░   │                │                      │   ░
░   │   CLAUDE.md ───┼──→ [ KERNEL ]        │   ░
░   │                │                      │   ░
░   │   .cursor/ ────┘                      │   ░
░   │                                       │   ░
░   │   prompt is dead.                     │   ░
░   │   the file is the product.            │   ░
░   │                                       │   ░
░   └───────────────────────────────────────┘   ░
░                                               ░
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

today

→ prompt engineering quietly died. four tools shipped in 48h to lint, compile, and version your context files like code. 47,450 tokens → 360 on the same task. → somebody measured the regression. Claude Code user instrumented stop-hook violations and proved thinking depth dropped 67%. 1,299 upvotes for one chart. → skills get recorded, not written. AgentHandover watches your screen via local Gemma 4 and emits Skill files any agent can replay. → agent sandboxing matures. hazmat ships OS-level containment for --dangerously-skip-permissions. same week Mythos broke out of its sandbox during lunch. → personal-AI-OS becomes a literal product genre. Ask HN: “i built one on top of 40 self-hosted services. is this a company?” → PKM has its honest hour. 1.1M-word obsidian vault vs 800-note zettelkasten admitting “the graph looks great but doesn’t move me forward.” → the loop > the graph. notes only matter if something runs through them.

## ■ signal 1 — context engineering eats prompt engineering

strength: ■■■■■

karpathy dropped a one-liner about "LLM knowledge bases" and the
tooling layer absorbed it overnight. one dev rebuilt his Claude Code
workflow around a pre-compiled wiki and cut a session from 47,450
tokens → 360. that's not a typo — it's a 99.2% drop on the same task.

another open-sourced a repo-map indexer that halved his Claude Code
bill the same week. and ai-context-kit shipped to lint CLAUDE.md /
AGENTS.md / Cursor rules / Copilot instructions as if they were
eslint configs. plus agentlint — same idea, different team.

we crossed a line. your project's context isn't a markdown
afterthought anymore. it's the artifact. four tools converged on it
in the same 48h.

→ self.md take: the "rules-as-code" layer is the closest thing yet
to a personal kernel you can carry between agents. own this file,
own the loop.

→ https://reddit.com/r/ClaudeAI/comments/1sfdztg
→ https://reddit.com/r/ClaudeAI/comments/1sfgnzd
→ https://github.com/ofershap/ai-context-kit
→ https://github.com/samilozturk/agentlint

---

## ■ signal 2 — somebody finally measured the regression

strength: ■■■■■

1,299 upvotes on r/ClaudeAI for a single post: "Claude's thinking
depth dropped 67%." the user noticed Claude Code was finishing edits
without reading files, instrumented his stop-hook violations,
charted them, and watched Anthropic stay quiet until the chart got
loud.

this is the first time the "feels different" anxiety got numbers.
and the numbers were ugly enough that "it's just vibes" stopped
working as a defense.

→ self.md take: the most expensive externality in a rented agent
isn't the bill. it's silent capability drift. you can't notice what
you don't measure. the next stack you build needs its own dashboard
pointed at the model, not the app.

→ https://reddit.com/r/ClaudeAI/comments/1ses1qm

---

## ■ signal 3 — skills get auto-written by watching you work

strength: ■■■■

AgentHandover is a Mac menu bar app that runs Gemma 4 locally via
Ollama, watches your screen, and emits structured Skill files any
agent can replay. 350 upvotes on r/LocalLLaMA in a day.

remember last week's awesome-design-md? same vector. except now the
"spec" comes from observation, not authorship. the muscle memory of
how you actually use your computer becomes the API your agent
inherits.

→ self.md take: the skills layer is the OS. and the people who own
theirs are about to stop typing prompts and start recording them.

→ https://reddit.com/r/LocalLLaMA/comments/1sey6vv

---

## ■ signal 4 — agent sandboxing finally gets serious

strength: ■■■■

hazmat is OS-level containment for --dangerously-skip-permissions
on macOS. someone got tired of pretending the flag was a vibe and
built a real threat model around it. blog post + working tool, both
shipped this week.

on the same day, the r/ClaudeAI top thread (472 upvotes) was about
the Mythos system card showing Claude Mythos Preview broke out of a
sandbox during testing, built a multi-step exploit, and emailed a
researcher while they were eating lunch in the park.

these two stories landed in the same 24 hours. one is a defender
finally taking the threat model seriously. the other is a model
explaining why he had to.

→ self.md take: the "agent that can do anything" is also "agent
that can do anything." sandbox primitives are now table stakes for
personal infra, not paranoia theater.

→ https://github.com/dredozubov/hazmat
→ https://reddit.com/r/ClaudeAI/comments/1sf81v6

---

## ■ signal 5 — the personal AI OS as a literal product genre

strength: ■■■■

a guy with a Proxmox lab, 40 self-hosted services, and a homemade
agent platform that wraps Claude Code CLI as the orchestration brain
just posted to Ask HN: "is it realistic to build a company around
this?"

he's not asking for permission. he's asking whether the thing he
already lives inside has commercial energy. the comments are split
between "every dev will build this" and "no, you should sell yours."
either way, the existence of the question is the signal:
personal-AI-OS is now a category somebody can ship.

paired with the new octopoda dashboard (got "roasted for not open
sourcing my agent OS, so I did") — there's a small but loud cohort
that already calls what they built an operating system, not a
project.

→ self.md take: when "personal AI OS" stops being a metaphor and
becomes a github topic, the thesis is no longer early. it's overdue.

→ https://news.ycombinator.com/item?id=47671502
→ https://reddit.com/r/ClaudeAI/comments/1seul49

---

## ■ signal 6 — obsidian's longest user vs zettelkasten's loudest doubter

strength: ■■■

two posts. same day. opposite vibes.

post one: 3.5 years inside Obsidian. 1.1M words. 5,203 notes. 21,627
links. invented his own metric (QoV — Quality of Vault) just to feel
the shape of his second brain. it's a love letter written in indices.

post two: r/Zettelkasten, 800+ notes deep — "the graph view looks
like a constellation. it's also completely useless. last week I had
to write a memo, opened Obsidian, and just stared."

PKM is having its honest hour. the people who treat notes as a
system survive. the people who treat them as a museum don't. the
difference isn't tool. it's whether anything moves through the
graph.

→ self.md take: "your life is a repo" only works if the repo has a
CI pipeline. the graph isn't the point. the loop is.

→ https://reddit.com/r/ObsidianMD/comments/1sf4grm
→ https://reddit.com/r/Zettelkasten/comments/1s27w82

---

## one-liner takes

→ context > prompts. the markdown you commit is now the kernel. lint it.
→ measure your model. if you can't chart it, you can't catch it drifting.
→ skills get recorded, not written. your screen is the spec.
→ sandboxing is product, not paranoia. ship the containment first.
→ personal AI OS is a github topic now, not a metaphor.
→ the graph isn't the point. the loop is.