context engineering
Table of content
by Ray Svitla
prompt engineering was the 2023 skill. you wrote a clever prompt, got a clever answer, felt like a wizard. then prompts got longer. then they got instructions. then instructions got memory systems, retrieval pipelines, tool configurations, and session management. somewhere in that progression, “prompt engineering” stopped describing what we were actually doing.
what we’re actually doing is context engineering.
what context engineering is
context engineering is the discipline of designing, building, and maintaining the complete information environment that an AI agent operates within. not just the prompt. everything the model sees before it generates a response.
this includes:
- instructions — CLAUDE.md, system prompts, skill files, behavioral rules
- memory — what the agent remembers from previous sessions, user preferences, project history
- retrieved information — search results, file contents, database queries, API responses
- tool descriptions — what MCP servers are available, what they do, how to call them
- conversation history — the current session’s back-and-forth
- meta-context — who’s asking, what project we’re in, what time it is, what matters right now
a prompt engineer writes a good question. a context engineer builds the entire information architecture that makes every question produce a good answer.
why it matters now
three things changed:
context windows got enormous. 200K tokens. a million tokens for some models. you can fit entire codebases, documentation sets, and conversation histories in a single context. the question shifted from “how do I fit my information in” to “how do I organize the ocean of information I could fit in.”
agents became persistent. AI isn’t a one-shot query anymore. agents run across sessions, accumulate memory, interact with tools, and build up state over time. managing that state is context engineering.
tools multiply context. every MCP server you add contributes tool descriptions to the context. ten servers with five tools each is 50 tool descriptions competing for the model’s attention. which tools to expose, when, and how they’re described — that’s context engineering.
the core problems
context rot
the longer a conversation runs, the more the context degrades. early instructions get pushed further from the model’s attention. irrelevant information accumulates. the signal-to-noise ratio drops until the agent starts making mistakes it wouldn’t have made at the start.
see context rot for the full analysis.
solutions: /clear between tasks. /compact to compress. structured instructions that resist degradation. session boundaries.
context pollution
bad instructions are worse than no instructions. a CLAUDE.md full of contradictions, vague aspirations, and irrelevant details actively degrades performance. every token of noise displaces a token of signal.
this is why your CLAUDE.md probably sucks and why fixing it matters.
retrieval relevance
memory systems and RAG pipelines retrieve information based on similarity. similarity isn’t the same as relevance. your memory system might surface a conversation from three months ago that’s semantically similar but contextually useless — or miss a crucial detail because it was phrased differently.
the gap between “found something related” and “found what actually matters” is where context engineering lives.
tool overload
each MCP tool description costs tokens and attention. with 50 tools available, the model spends significant context just understanding what it could do before deciding what it should do. tool routing — dynamically loading only relevant tools — is an active area of context engineering.
see tool routing and MCP server composition .
the layers
context engineering operates at multiple timescales:
persistent layer (changes rarely)
- CLAUDE.md project instructions
- skill files and behavioral rules
- MCP server configurations
- user preferences and identity
session layer (changes per conversation)
- conversation history
- retrieved documents and search results
- active tool descriptions
- task-specific context
ephemeral layer (changes per turn)
- current tool call results
- intermediate reasoning
- working memory and scratchpad
good context engineering is clear about which layer each piece of information belongs to. project conventions go in CLAUDE.md (persistent). today’s research results go in the conversation (session). a specific API response is ephemeral.
putting ephemeral information in the persistent layer creates noise. putting persistent knowledge in the session layer means repeating yourself constantly.
practical techniques
instruction design
write CLAUDE.md files that are structured, scannable, and hierarchical. put the most important rules first. use headers and lists, not prose. be specific — “use TypeScript strict mode” not “write good code.”
see the CLAUDE.md guide for patterns.
memory architecture
choose what to remember and what to forget. not all information has equal shelf life. project conventions last months. today’s debugging context lasts hours. a specific error message lasts minutes.
see agent memory systems and memory consolidation .
retrieval tuning
when you use RAG or memory retrieval, the quality of what comes back determines the quality of what the agent produces. chunk size, embedding model, retrieval strategy (dense, sparse, hybrid ) — these are context engineering decisions.
compression
long conversations need compression. memory compression techniques — summarization, key-fact extraction, importance ranking — keep the essential information while discarding the noise.
tool routing
don’t load every MCP server for every task. route tools based on the task at hand. writing code? load the GitHub and filesystem tools. doing research? load search and web fetch. managing infrastructure? load AWS and Docker.
context engineering vs prompt engineering
| prompt engineering | context engineering |
|---|---|
| craft one prompt | design entire information architecture |
| per-query optimization | system-level optimization |
| text in → text out | memory + retrieval + tools + instructions |
| session-scoped | cross-session, persistent |
| individual skill | team discipline |
prompt engineering is a subset of context engineering. a necessary one — good prompts still matter. but optimizing prompts without optimizing the surrounding context is like tuning the engine while ignoring the fuel quality.
the meta-skill
context engineering is becoming the core competency for anyone building with AI agents. it’s not about knowing the model’s quirks or finding magic phrases. it’s about information architecture: what does the model need to know, when does it need to know it, and how do you keep the signal clean as complexity grows?
the best CLAUDE.md file, the best memory system, the best MCP configuration — they’re all context engineering artifacts. the discipline is the same across all of them: give the model exactly the right information at exactly the right time.
nothing more. nothing less. and that’s much harder than it sounds.
Ray Svitla stay evolving