control surfaces

self.md radar — 2026-04-14

today looked less like better models and more like AI getting wrapped in interfaces you can actually steer. three signals: operator controls surfacing inside agent tooling, a localhost package that turns offline into a real knowledge surface, and UX evaluation collapsing from quarterly ritual into an agentic loop.

1. the agent stack is growing an ops console

sources:

what happened:

claude users can now reportedly switch models inside a live chat without restarting the thread. a small TUI surfaced that classifies where Claude Code tokens go, making token burn inspectable per call. separately, Microsoft shipped playwright-cli for record/generate/inspect/screenshot browser actions, explicitly framed as token-efficient for coding agents.

why this matters:

routing, visibility, and actuation are becoming operator controls inside chat and agent surfaces — not hidden provider decisions. the pattern is consistent: whoever runs the agent wants a console, not just a prompt box.

2. project N.O.M.A.D. packages localhost as a survival-grade knowledge surface

sources:

what happened:

project N.O.M.A.D. bundles local AI chat with RAG, offline Wikipedia via Kiwix, Kolibri for structured learning, offline maps, CyberChef, notes, and benchmark tooling into a single server at localhost:8080. it is positioned as a self-contained offline-first knowledge and education server that runs on minimal hardware.

why this matters:

this is productized offline infrastructure as one browser tab, not just another local model repo. when the network disappears, the question becomes what knowledge surface survives — and someone finally shipped an answer.

3. UX evaluation becomes a nightly agentic loop

sources:

what happened:

OpenFlo simulates user behavior on real websites with GUI grounding, simulated profiles, SUS/SEQ metrics, think-aloud traces, and structured reports. the paper frames this as continuous, scalable usability testing designed for small teams and agile workflows. it runs against live sites, not mockups.

why this matters:

usability review starts looking like CI instead of a quarterly research ritual. if UX evaluation can run nightly against production, the feedback loop between ship and measure collapses to hours.

left on the table