Context Window Management

Table of content

Every AI conversation has a hidden limit: the context window. Exceed it and your AI forgets, hallucinates, or gives garbage output. Managing context is the difference between a useful session and a frustrating one.

Why context matters

The context window is your AI’s working memory. Everything goes in:

What counts	Example size
System prompt	2-5K tokens
Your messages	Variable
AI responses	Often 2-3x your input
File contents	1K tokens per ~750 words
Tool calls and results	Adds up fast

When context fills up:

AI “forgets” early instructions
Responses become generic
Reasoning quality degrades
Hallucinations increase

The 60% rule

Never exceed 60% context usage. Quality degrades before you hit the limit.

[============================                    ] 60% - Sweet spot
[========================================        ] 80% - Quality dropping
[================================================] 100% - Broken

Context level	What happens
Under 40%	Peak performance
40-60%	Still good, monitor
60-80%	Noticeable degradation
80%+	Start new conversation

Signs of context overflow

Your AI is struggling when:

It forgets instructions you gave earlier
Responses become vague or generic
It repeats itself
It contradicts previous statements
Code quality drops
It stops following your CLAUDE.md rules

Strategies

Phase-based work

Split complex work into discrete phases. Clear context between each.

Phase 1: Research
├── Explore codebase
├── Save findings to thoughts/research.md
└── End conversation

Phase 2: Plan
├── Read thoughts/research.md
├── Create PLAN.md
└── End conversation

Phase 3: Implement
├── Read PLAN.md
├── Execute tasks
└── End conversation

Phase 4: Validate
├── Run tests
├── Fix issues
└── End conversation

Each phase starts fresh with only the context it needs.

Save to files

Offload context to persistent storage:

# Create a thoughts directory
mkdir -p thoughts/

# During work, save important context
claude "Save your current understanding to thoughts/project-state.md"

# In new conversation, load it back
claude "Read thoughts/project-state.md and continue"

What to save:

Key decisions and rationale
Current progress
Blockers encountered
Next steps

Use Memory MCP

For cross-session persistence:

# Save important context
claude "Remember: the auth system uses JWT with 24h expiry"

# In any future session
claude "What do you remember about our auth system?"

See Building a Memory System for setup.

Start fresh strategically

New conversation is not failure. It’s a tool.

Situation	Action
Task complete	New conversation
Context over 60%	Save state, new conversation
AI seems confused	New conversation
Switching tasks	New conversation

Monitoring context

Claude Code

# Check current usage
/context

# Compact to reduce usage
/compact

API usage

Track token counts in responses:

{
  "usage": {
    "input_tokens": 15234,
    "output_tokens": 892
  }
}

Compare against model limits:

Model	Context window
Claude Sonnet	200K tokens
Claude Opus	200K tokens
GPT-4	128K tokens

60% of 200K = 120K tokens max recommended.

When to start new

Trigger a new conversation when:

Context exceeds 60%
AI forgets earlier instructions
Response quality drops
Task is complete
Switching to unrelated work
You’ve been going for 30+ messages

Before starting fresh:

Save current state to file
Note any unfinished tasks
Capture key decisions

The workflow

Start task
    ↓
Work (monitor context)
    ↓
Context > 60%? ──Yes──→ Save state to file
    ↓ No                      ↓
Continue              Start new conversation
    ↓                         ↓
Task complete?        Load state from file
    ↓ Yes                     ↓
Save learnings        Continue work
    ↓
End

Long conversations feel productive. Fresh conversations are productive.

Next: Building a Memory System