context is the new infrastructure
Table of content
by Ray Svitla
your Claude Code session just burned 150,000 tokens doing basically nothing.
you ran ls. you checked git status. you asked it to refactor a function. standard Wednesday morning work. but every command dumps its entire output into context. every file read. every error message. every webpack warning you’ve been ignoring for three months.
the agent doesn’t filter. it consumes.
and you’re paying for it. not just in API costs (though those add up). in context window exhaustion. in the agent forgetting what you told it an hour ago because it’s too busy remembering the 847 lines of npm install output from your last dependency upgrade.
this is the new infrastructure problem: context is expensive, and nobody’s treating it like infrastructure yet.
grep for the LLM era
someone finally built the obvious tool.
rtk
(Rust Token Killer) is a CLI proxy that sits between your shell and your coding agent. it filters command output before it hits context. ls goes from 4,000 tokens to 800. git status drops from 3,000 to 600. typical 30-minute session: 150K tokens → 45K.
60-90% reduction. one Rust binary. zero config.
it’s grep, but for tokens instead of lines. the unix philosophy applied to LLM context management.
here’s what makes it interesting: rtk doesn’t try to be smart. it doesn’t use AI to “summarize” output. it uses dumb, reliable compression: trim whitespace, drop duplicate lines, filter out noise patterns. the same techniques you’d use in a shell script, except automated and optimized for the specific garbage coding agents don’t need to see.
this is infrastructure thinking. not “how do we make the agent smarter” but “how do we make the environment less noisy.”
hoard things you know how to do
Simon Willison just published a new agentic engineering pattern : “hoard things you know how to do.”
the idea: the more examples of working code you have (blog posts, TILs, GitHub repos, proof-of-concepts), the better you get at spotting what’s possible. LLMs amplify this — your hoard becomes your agent’s training data.
Simon hoards aggressively. tools.simonwillison.net is a graveyard of single-file HTML tools. each one solves a specific problem. each one demonstrates a technique. he’s built hundreds of them, mostly with LLM assistance.
his simonw/research repo is even weirder: he challenges coding agents to research a problem and come back with working code + a written report. the repo is a collection of agent-generated research artifacts.
why? because the best agentic engineers aren’t the ones with perfect prompts. they’re the ones with the deepest collection of “I’ve seen this done before.”
your GitHub isn’t just a portfolio. it’s your second brain. and if you’re working with agents, it’s their context too.
the multi-AI config nightmare
here’s a problem nobody talks about: if you use multiple coding agents (Claude, Codex, Cursor, Gemini), you have multiple config files.
CLAUDE.md. AGENTS.md. .cursorrules. GEMINI.md.
four files saying roughly the same thing. four chances to get out of sync. you update one, forget the other three. Cursor starts hallucinating your project structure. Claude has it right. you waste an hour debugging before realizing: oh, I forgot to sync the configs.
someone on Hacker News finally snapped and built claude-faf-mcp : an MCP server that reads a single YAML file (project.faf) and generates all four formats. bi-directional sync. 61 tools, 351 tests. works natively in Claude Desktop.
you edit one file. everything stays current.
the .faf format is even IANA-registered (application/vnd.faf+yaml). someone went through the bureaucracy to make this a real standard.
this is dotfiles for the multi-AI era. the problem isn’t “how do I configure one agent” anymore. it’s “how do I keep five agents in sync without losing my mind.”
context drift is the silent killer
here’s the pattern:
you start a project. your agent knows everything. AGENTS.md is fresh. your context is clean.
two weeks later: you’ve added three dependencies, refactored the auth layer, changed the database schema. you updated AGENTS.md once. maybe.
the agent is now operating on stale context. it suggests patterns you deprecated. it imports files that don’t exist. you spend half your time correcting it instead of building.
this is context drift.
and it’s worse than code drift because the agent doesn’t throw errors. it just gets dumber, gradually, until you realize you’re spending more time explaining what you already built than building new things.
rtk addresses one side of this (reduce noise in real-time context). hoarding addresses another (expand your library of working patterns). config sync addresses a third (keep multiple agents aligned).
but the real solution is treating context as infrastructure. version it. audit it. prune it. the same way you’d manage dependencies or database migrations.
the invisible attack surface
one more thing: invisible characters .
researchers tested 5 models across 8,000+ cases. result: you can embed invisible Unicode characters in text that trick AI agents into following hidden instructions.
someone puts an invisible payload in a GitHub issue. your agent reads it. you don’t see it. the agent executes hidden commands.
this is a supply chain attack you can’t audit with cat. if your agent has file access, it’s a backdoor you can’t close by reading the text.
the more agents integrate into workflows (reading emails, processing docs, scraping web pages), the bigger the attack surface. and security tooling for agents is still playing catchup.
context isn’t just expensive. it’s exploitable.
what changes
if context is infrastructure, here’s what we need:
→ compression tooling (rtk is the first, won’t be the last)
→ versioned context management (git for agent memory)
→ config sync for multi-agent setups (dotfiles, but for LLMs)
→ context auditing tools (detect drift, flag stale info)
→ security filters (scan for invisible payloads, malicious instructions)
this isn’t exotic. it’s the same infrastructure thinking we apply to databases, caches, CI/CD pipelines. except now the cache is an LLM’s context window, and the pipeline is your agent’s workflow.
the agents aren’t getting cheaper. context windows are growing, but so is the noise. token costs are going down, but usage is going up faster.
your next bottleneck isn’t compute. it’s memory. and memory, in the LLM era, is context.
Ray Svitla
stay evolving 🐌