control surfaces

2026-04-14

self.md radar — 2026-04-14

today looked less like better models and more like AI getting wrapped in interfaces you can actually steer. three signals: operator controls surfacing inside agent tooling, a localhost package that turns offline into a real knowledge surface, and UX evaluation collapsing from quarterly ritual into an agentic loop.

1. the agent stack is growing an ops console

sources:

what happened:

claude users can now reportedly switch models inside a live chat without restarting the thread. a small TUI surfaced that classifies where Claude Code tokens go, making token burn inspectable per call. separately, Microsoft shipped playwright-cli for record/generate/inspect/screenshot browser actions, explicitly framed as token-efficient for coding agents.

why this matters:

routing, visibility, and actuation are becoming operator controls inside chat and agent surfaces — not hidden provider decisions. the pattern is consistent: whoever runs the agent wants a console, not just a prompt box.

2. project N.O.M.A.D. packages localhost as a survival-grade knowledge surface

sources:

what happened:

project N.O.M.A.D. bundles local AI chat with RAG, offline Wikipedia via Kiwix, Kolibri for structured learning, offline maps, CyberChef, notes, and benchmark tooling into a single server at localhost:8080. it is positioned as a self-contained offline-first knowledge and education server that runs on minimal hardware.

why this matters:

this is productized offline infrastructure as one browser tab, not just another local model repo. when the network disappears, the question becomes what knowledge surface survives — and someone finally shipped an answer.

3. UX evaluation becomes a nightly agentic loop

sources:

what happened:

OpenFlo simulates user behavior on real websites with GUI grounding, simulated profiles, SUS/SEQ metrics, think-aloud traces, and structured reports. the paper frames this as continuous, scalable usability testing designed for small teams and agile workflows. it runs against live sites, not mockups.

why this matters:

usability review starts looking like CI instead of a quarterly research ritual. if UX evaluation can run nightly against production, the feedback loop between ship and measure collapses to hours.

supporting links

SnapState — persistent state for agent workflows across resumes and handoffs
Pruner — structural code indexing to cut agent exploration overhead and context waste
agnix — linter/LSP for CLAUDE.md, AGENTS.md, SKILL.md, hooks, and MCP configs
Claude Cookbooks — operational recipes becoming a product surface

left on the table

Palantir hospital rollback thread — strong story but yesterday already covered the permission-surface lane; repeating it would blur the edition boundary
Claude cache TTL / caching behavior cluster — same story family as Apr 12’s hidden-routing and provider-opacity coverage; no hard new fact to justify a re-entry
VoxCPM — voice and identity already dominated Apr 10 and Apr 13; third pass in five days would be noise
multica — same ops-console bucket as signal 1 but weaker slot; folding it in would have diluted the sharper examples