context engineering

Table of content

by Ray Svitla


prompt engineering was the 2023 skill. you wrote a clever prompt, got a clever answer, felt like a wizard. then prompts got longer. then they got instructions. then instructions got memory systems, retrieval pipelines, tool configurations, and session management. somewhere in that progression, “prompt engineering” stopped describing what we were actually doing.

what we’re actually doing is context engineering.


what context engineering is

context engineering is the discipline of designing, building, and maintaining the complete information environment that an AI agent operates within. not just the prompt. everything the model sees before it generates a response.

this includes:

a prompt engineer writes a good question. a context engineer builds the entire information architecture that makes every question produce a good answer.


why it matters now

three things changed:

context windows got enormous. 200K tokens. a million tokens for some models. you can fit entire codebases, documentation sets, and conversation histories in a single context. the question shifted from “how do I fit my information in” to “how do I organize the ocean of information I could fit in.”

agents became persistent. AI isn’t a one-shot query anymore. agents run across sessions, accumulate memory, interact with tools, and build up state over time. managing that state is context engineering.

tools multiply context. every MCP server you add contributes tool descriptions to the context. ten servers with five tools each is 50 tool descriptions competing for the model’s attention. which tools to expose, when, and how they’re described — that’s context engineering.


the core problems

context rot

the longer a conversation runs, the more the context degrades. early instructions get pushed further from the model’s attention. irrelevant information accumulates. the signal-to-noise ratio drops until the agent starts making mistakes it wouldn’t have made at the start.

see context rot for the full analysis.

solutions: /clear between tasks. /compact to compress. structured instructions that resist degradation. session boundaries.

context pollution

bad instructions are worse than no instructions. a CLAUDE.md full of contradictions, vague aspirations, and irrelevant details actively degrades performance. every token of noise displaces a token of signal.

this is why your CLAUDE.md probably sucks and why fixing it matters.

retrieval relevance

memory systems and RAG pipelines retrieve information based on similarity. similarity isn’t the same as relevance. your memory system might surface a conversation from three months ago that’s semantically similar but contextually useless — or miss a crucial detail because it was phrased differently.

the gap between “found something related” and “found what actually matters” is where context engineering lives.

tool overload

each MCP tool description costs tokens and attention. with 50 tools available, the model spends significant context just understanding what it could do before deciding what it should do. tool routing — dynamically loading only relevant tools — is an active area of context engineering.

see tool routing and MCP server composition .


the layers

context engineering operates at multiple timescales:

persistent layer (changes rarely)

session layer (changes per conversation)

ephemeral layer (changes per turn)

good context engineering is clear about which layer each piece of information belongs to. project conventions go in CLAUDE.md (persistent). today’s research results go in the conversation (session). a specific API response is ephemeral.

putting ephemeral information in the persistent layer creates noise. putting persistent knowledge in the session layer means repeating yourself constantly.


practical techniques

instruction design

write CLAUDE.md files that are structured, scannable, and hierarchical. put the most important rules first. use headers and lists, not prose. be specific — “use TypeScript strict mode” not “write good code.”

see the CLAUDE.md guide for patterns.

memory architecture

choose what to remember and what to forget. not all information has equal shelf life. project conventions last months. today’s debugging context lasts hours. a specific error message lasts minutes.

see agent memory systems and memory consolidation .

retrieval tuning

when you use RAG or memory retrieval, the quality of what comes back determines the quality of what the agent produces. chunk size, embedding model, retrieval strategy (dense, sparse, hybrid ) — these are context engineering decisions.

compression

long conversations need compression. memory compression techniques — summarization, key-fact extraction, importance ranking — keep the essential information while discarding the noise.

tool routing

don’t load every MCP server for every task. route tools based on the task at hand. writing code? load the GitHub and filesystem tools. doing research? load search and web fetch. managing infrastructure? load AWS and Docker.


context engineering vs prompt engineering

prompt engineeringcontext engineering
craft one promptdesign entire information architecture
per-query optimizationsystem-level optimization
text in → text outmemory + retrieval + tools + instructions
session-scopedcross-session, persistent
individual skillteam discipline

prompt engineering is a subset of context engineering. a necessary one — good prompts still matter. but optimizing prompts without optimizing the surrounding context is like tuning the engine while ignoring the fuel quality.


the meta-skill

context engineering is becoming the core competency for anyone building with AI agents. it’s not about knowing the model’s quirks or finding magic phrases. it’s about information architecture: what does the model need to know, when does it need to know it, and how do you keep the signal clean as complexity grows?

the best CLAUDE.md file, the best memory system, the best MCP configuration — they’re all context engineering artifacts. the discipline is the same across all of them: give the model exactly the right information at exactly the right time.

nothing more. nothing less. and that’s much harder than it sounds.


Ray Svitla stay evolving

Topics: context-engineering ai-agents memory architecture