how to reduce claude code costs by 50%

Table of content

by Ray Svitla

claude code costs money. whether you’re on a subscription hitting rate limits or on API billing watching the meter tick, every token has a price. most people waste 30-60% of their tokens on things that don’t improve their output.

here’s where the waste lives and how to kill it.

1. trim your CLAUDE.md

your CLAUDE.md is loaded every single session. a 2,000-word CLAUDE.md costs ~4,000 tokens per session. if you run 20 sessions a day, that’s 80,000 tokens daily just for instructions — many of which aren’t relevant to the current task.

the fix: cut your CLAUDE.md to under 500 words. keep: stack, commands, critical conventions. delete: backstory, philosophy, aspirational guidelines, anything that doesn’t change behavior.

savings: 2,000-6,000 tokens per session. across a month of daily use: 1-3 million tokens.

2. start fresh sessions

the longer a session runs, the more context accumulates. old file reads, previous conversations, command outputs — all sitting in context, all costing tokens on every subsequent interaction.

the fix: one task, one session. finish the auth feature? exit, claude. new session, clean context.

savings: a fresh session might use 5,000 tokens for context. a stale session: 50,000-100,000 tokens of accumulated history. that’s a 10-20x difference in baseline cost.

3. limit command output

npm test with verbose output can dump 10,000+ tokens into context. cat on a large file: same. every character of command output goes into the context window.

the fix: tell the agent to pipe through tail or head:

"run npm test 2>&1 | tail -30"
"grep 'error' in src/ | head -15"
"show me the first 50 lines of src/config.ts"

savings: 5,000-20,000 tokens per command. across a session with 10 commands: 50,000-200,000 tokens.

4. be specific about what to read

“look at the auth module” makes the agent read every file in the auth directory. “look at src/auth/middleware.ts” reads one file.

the fix: always point to specific files when you know which ones matter. save the broad exploration for when you genuinely don’t know where to look.

# expensive: reads 8 files to find the right one
"fix the bug in the auth module"

# cheap: reads 1 file
"fix the null check in src/auth/middleware.ts line 47"

savings: 5-10x fewer file read tokens per task.

5. use plan mode before execution

a wrong approach that needs to be undone costs 2-3x the tokens of a correct approach. plan mode lets the agent think through the strategy using thinking tokens (cheaper) before executing with tool calls (more expensive).

the fix: for anything non-trivial, start with:

"/plan how should I approach adding search to the product list?"

review the plan. then execute.

savings: hard to quantify per-interaction, but eliminating one failed-and-retried approach per day saves 50,000-100,000 tokens.

6. use sonnet, not opus (usually)

opus is roughly 5x more expensive than sonnet per token. for most coding tasks — writing functions, fixing bugs, running tests — sonnet produces equivalent results.

the fix: save opus for architecture decisions, complex debugging, and tasks where you’ve tried sonnet and the output wasn’t good enough. default to sonnet for everything else.

savings: if 80% of your work uses sonnet instead of opus: 60-70% cost reduction on those interactions.

7. don’t ultrathink everything

ultrathink generates 10,000-30,000 thinking tokens per response. most tasks don’t benefit from this level of reasoning.

the fix: start without extended thinking. escalate to “think hard” or “ultrathink” only when normal responses are insufficient. most coding tasks need zero extended thinking.

savings: 10,000-30,000 tokens per interaction where you’d have unnecessarily used ultrathink.

five separate sessions for five related changes means five CLAUDE.md loads, five context buildups, five session overheads. one session with five batched changes means one overhead.

"make these changes to the user dashboard:
1. add pagination (20 items per page)
2. add sort by date/name toggle
3. show user avatar next to each entry
4. add loading skeleton
5. fix the layout shift on initial load"

savings: 4x reduction in per-change overhead tokens.

9. compact strategically

when context gets large mid-session, use /compact with a focus:

"/compact — keep only the current task: fixing the payment webhook"

this tells the compaction what to preserve and what to discard. blind compaction might drop important context; focused compaction keeps what matters.

savings: extends session usefulness without starting fresh, avoiding the cost of re-establishing context.

10. audit your MCP servers

MCP tools that return large payloads (full database query results, long API responses) dump that data into context. design your MCP tools to return summaries, not raw data.

// bad: returns 1000 rows
server.tool("query", ..., async ({ sql }) => {
  const rows = db.prepare(sql).all();
  return { content: [{ type: "text", text: JSON.stringify(rows) }] };
});

// good: returns count + first 10 rows
server.tool("query", ..., async ({ sql }) => {
  const rows = db.prepare(sql).all();
  const summary = {
    total: rows.length,
    sample: rows.slice(0, 10),
  };
  return { content: [{ type: "text", text: JSON.stringify(summary) }] };
});

savings: depends on your data, but can be 10,000-100,000 tokens per MCP call.

the combined effect

none of these individually cuts costs in half. combined, they easily do:

technique	token savings
lean CLAUDE.md	5-10%
fresh sessions	10-20%
limited command output	5-15%
specific file reads	5-10%
plan mode	5-10%
sonnet over opus	10-20%
no unnecessary ultrathink	5-10%
batched changes	5-10%

total: 50-100% reduction in token waste. that’s real money — $50-200/month for heavy users.

the principle underneath: treat tokens like you treat memory, bandwidth, or any other finite resource. don’t allocate what you don’t need. free what you’re done with. the cheapest token is the one you never generate.

→ claude code pricing guide — understand what you’re paying → context optimization — deep dive on context management → ultrathink — when expensive reasoning is worth it

Ray Svitla stay evolving

Topics: claude-code costs optimization tokens