claude code context window optimization
Table of content
by Ray Svitla
claude code’s context window is not infinite. it’s 200k tokens, which sounds like a lot until you realize how fast it fills up. your CLAUDE.md, the conversation history, every file the agent reads, every command output it captures — all of it eats context.
when the window fills up, claude code compacts the conversation. it summarizes older messages and drops details. information you mentioned ten messages ago might be gone. the agent gets dumber as the session gets longer.
this is the single biggest practical problem in daily claude code usage, and most people just ignore it until things start going wrong.
where your tokens actually go
a rough breakdown of a typical claude code session:
→ system prompt + CLAUDE.md: 2,000-8,000 tokens (depends on your CLAUDE.md size)
→ conversation history: grows with every message — your prompts + claude’s responses
→ file reads: each file claude reads gets added. a 500-line file is ~2,000-4,000 tokens
→ command output: npm test output, grep results, build logs — all in context
→ tool calls: each tool use adds request + response tokens
the worst offender: long command outputs. one npm test with verbose logging can dump 10,000+ tokens into context. a cat on a large file: same story. these are one-time reads that permanently occupy space in your session.
the compact wall
when your conversation approaches the context limit, claude code triggers “auto-compact.” it summarizes the conversation to free up space. this is better than crashing, but it’s lossy — nuance disappears, specific code snippets get summarized into vague descriptions, and the agent loses track of details.
you can manually trigger compaction with /compact or by typing “compact”. but compaction is always a loss. the goal is to not need it.
practical strategies
keep your CLAUDE.md lean
your CLAUDE.md loads on every single session. every word in it is a permanent tax. a 3,000-word CLAUDE.md essay about your project’s philosophy costs tokens for the entire session, even when you’re doing something unrelated.
audit your CLAUDE.md ruthlessly. if a line doesn’t change claude’s behavior, delete it. aim for under 1,000 tokens — that’s roughly 400-500 words.
start fresh sessions often
don’t run one session for an entire day. each task should get its own session. finish the authentication feature? exit, start a new session for the API endpoint work.
long sessions accumulate context debris — old file reads, irrelevant conversations, stale command outputs. a fresh session is a clean context.
the /clear command resets conversation history without restarting. use it between distinct tasks in the same session.
be specific about file reads
“look at the auth module” is vague. claude might read every file in src/auth/ to figure out what you mean — six files, 20,000 tokens. “look at src/auth/middleware.ts” reads one file, 3,000 tokens.
the more specific you are about what to read, the less context gets wasted on files that aren’t relevant.
redirect command output
instead of letting claude run npm test and capture all the output, tell it: “run npm test 2>&1 | tail -20 and show me the last 20 lines.” most of the time, the last few lines of output are all you need. the first 500 lines of passing tests are noise.
same for grep: “grep for X in src/ and show the first 10 results” is better than unbounded grep that returns 200 matches.
use plan mode for exploration
when you’re not sure how to approach a task, use plan mode (/plan or shift+tab). plan mode thinks without executing — no file reads, no command runs, no tool calls. it uses thinking tokens, which are cheaper than the context cost of exploratory file reads.
plan first, then execute with a clear understanding of which files matter.
chunk large tasks
“refactor the entire API layer” is one task that will fill your context window halfway through. break it into: “refactor the users endpoint,” “refactor the orders endpoint,” “refactor the auth endpoints.” each gets its own session or at least a /clear between them.
smaller tasks = less context accumulation = better quality responses throughout.
the CLAUDE.md token budget
here’s a framework for thinking about your CLAUDE.md size:
| CLAUDE.md size | token cost | verdict |
|---|---|---|
| under 300 words | ~500 tokens | ideal for most projects |
| 300-600 words | ~1,000 tokens | fine if every word earns its place |
| 600-1,500 words | ~2,500 tokens | probably has filler. audit it |
| 1,500+ words | ~4,000+ tokens | you’re writing docs, not instructions |
every token in CLAUDE.md is multiplied by every session you run. 2,000 extra tokens × 20 sessions/day × 20 work days = 800,000 wasted tokens per month.
monitoring context usage
claude code shows context usage in the bottom bar. watch it. when you see it climbing past 50%, start thinking about whether to compact or start fresh.
some patterns that burn context fast:
→ asking claude to “explore the codebase” (reads many files) → running tests without output limits (captures everything) → debugging loops where the agent tries multiple approaches (each attempt adds context) → pasting large code blocks into the conversation
patterns that preserve context:
→ pointing to specific files → limiting command output → using plan mode before execution → starting fresh sessions between tasks → keeping CLAUDE.md minimal
the deeper principle
context management is attention management. the model pays attention to everything in its context window. noise dilutes signal. the files it read three tasks ago are still sitting there, slightly confusing every subsequent response.
this isn’t unique to claude code. every LLM-based tool has this problem. the skills you build here — being precise about what the model sees, managing information flow, keeping context clean — transfer to any AI tool you’ll ever use.
treat context like RAM. don’t load what you don’t need. free what you’re done with. keep the working set small and relevant.
→ your CLAUDE.md sucks — write better instructions → reduce claude code costs — broader cost optimization → ultrathink — when to spend tokens on deep reasoning
Ray Svitla stay evolving