ultrathink: deep reasoning in claude code
Table of content
by Ray Svitla
claude code has a feature most people either don’t know about or use wrong. in your prompt, add the word “ultrathink” and the model switches to extended thinking mode — burning more tokens to reason through complex problems before responding.
it’s powerful. it’s also expensive. here’s when it’s worth it and when it’s a waste.
what extended thinking actually does
normally, claude generates a response token by token, left to right. extended thinking gives the model a scratchpad — a chain-of-thought process that happens before the visible response. the model can explore approaches, backtrack, reconsider, and build up a plan before committing to an answer.
think of it as the difference between someone answering a question immediately and someone who says “let me think about that for a minute” and then gives you a better answer.
the trigger words in claude code, from lightest to heaviest:
→ “think” — light reasoning pass → “think hard” — moderate extended thinking → “think harder” — more budget for reasoning → “ultrathink” — maximum reasoning budget
each level allocates more tokens to the thinking process. more tokens = more reasoning depth = higher cost.
what it costs
extended thinking tokens aren’t free. the thinking process generates internal tokens that count toward your usage, even though you never see them directly. a normal response might use 1,000-3,000 tokens. an ultrathink response on a complex problem can use 10,000-30,000+ tokens on thinking alone, before the actual response.
on API pricing: claude sonnet input is $3/million tokens, output is $15/million. thinking tokens are billed as output tokens. so an ultrathink session that generates 20,000 thinking tokens costs roughly $0.30 just for the thinking — before the actual response.
on pro/max plans, this eats into your rate limits faster. a session that would normally let you make 20 requests before hitting the limit might only allow 5-8 with ultrathink running on each.
the math is simple: ultrathink costs 3-10x more per interaction. use it when the quality improvement justifies that multiplier.
when ultrathink is worth it
architecture decisions. “I need to redesign the auth system to support multi-tenancy. here’s the current schema. ultrathink.” — this is exactly the right use case. the model needs to consider multiple approaches, tradeoffs, migration paths. shallow thinking produces shallow architecture.
complex debugging. the test fails in CI but passes locally. the error message is unhelpful. you’ve been stuck for an hour. “ultrathink: here’s the test, here’s the CI config, here’s the error. what’s going wrong?” — extended thinking can trace through execution paths you haven’t considered.
cross-file refactoring. “I need to change how we handle errors across the entire API layer. ultrathink about the approach first, then execute.” — planning before acting, especially when the blast radius is large.
algorithm design. anything involving data structures, performance optimization, or complex logic. these problems have multiple valid approaches and non-obvious tradeoffs. extended thinking actually explores the space.
when ultrathink is a waste
simple edits. “add a loading spinner to the submit button. ultrathink.” — no. this doesn’t need deep reasoning. you’re burning 10x tokens for a three-line change.
routine code generation. “write a CRUD endpoint for users. ultrathink.” — the model has written ten thousand CRUD endpoints. it doesn’t need to think hard about this one.
formatting and style changes. “convert these callbacks to async/await. ultrathink.” — mechanical transformation, not reasoning.
questions with obvious answers. “what’s the syntax for a TypeScript interface? ultrathink.” — you just spent $0.30 on something that needed $0.01.
the tactical approach
start without extended thinking. if the response is shallow, vague, or misses something important, retry with “think hard.” if that’s still not enough, escalate to “ultrathink.”
don’t start at ultrathink and work down. start cheap and escalate. most tasks don’t need deep reasoning. the ones that do will be obvious because the normal response will feel unsatisfying.
a good pattern:
- describe the problem normally
- review the response
- if it’s missing depth: “think harder about the edge cases here”
- if it’s still insufficient: “ultrathink — I need you to really work through this”
this way, 80% of your interactions stay cheap, and the expensive ones are targeted where they matter.
plan mode + ultrathink
the combination of plan mode and ultrathink is where things get interesting. plan mode tells claude code to think and plan without executing. ultrathink tells it to think deeply.
/plan ultrathink: we need to migrate from REST to GraphQL.
here's the current API structure. what's the approach?
you get a detailed migration plan with considerations you wouldn’t have thought of, and it costs thinking tokens instead of thinking tokens plus execution tokens from false starts.
plan first, ultrathink on the plan, then execute the plan with normal reasoning. this is the most cost-effective way to handle complex tasks.
the honest take
ultrathink is real. the quality difference on complex tasks is noticeable. I’ve seen it catch race conditions, suggest migration strategies I hadn’t considered, and debug problems in minutes that had stumped me for hours.
but it’s not magic. it’s a tradeoff: more tokens for more thinking depth. on simple tasks, those extra tokens are pure waste. on complex tasks, they’re the difference between a mediocre answer and a genuinely good one.
the skill isn’t knowing that ultrathink exists. it’s knowing which problems deserve it.
→ context optimization — manage token costs overall → plan mode mastery — think before executing → reduce claude code costs — practical cost optimization
Ray Svitla stay evolving