Agentic Design Patterns: ReAct, Reflection, Planning, Tool Use

Table of content

Four patterns separate effective AI agents from expensive experiments. Knowing when to use each one determines whether your agent solves problems or spins in circles.

The Four Core Patterns

Pattern	What It Does	When to Use
ReAct	Interleaves reasoning and action	Adaptive problem-solving
Reflection	Self-critique and revision	Quality improvement
Planning	Task decomposition upfront	Complex multi-step work
Tool Use	External capability access	Information retrieval, execution

These patterns come from research at Google, AWS, and Anthropic. They show up repeatedly in production agents because they work.

ReAct: Reason-Act Loops

ReAct (Reasoning and Acting) alternates between thinking and doing. The agent reasons about what to do next, does it, observes the result, then reasons again.

The ReAct paper (Yao et al., 2022) tested this on question answering, fact verification, and interactive tasks. It beat both pure reasoning (chain-of-thought) and pure acting (action-only) approaches because the thinking grounds the action, and the action grounds the thinking.

The Loop

Thought: Analyze current state, identify what's missing
Action: Take a specific step (search, compute, call API)
Observation: Receive feedback from the action
... repeat until done ...

Example: Research Task

Thought: User wants to know the current price of Bitcoin.
         My knowledge has a cutoff date. I need live data.

Action: Search web for "Bitcoin price USD"

Observation: $67,432 as of 10:23 AM EST

Thought: I have current data. I can answer the question.

Final Answer: Bitcoin is currently trading at $67,432 USD.

When to Use ReAct

Yes: Tasks where the path isn’t clear upfront
Yes: Problems requiring external information
Yes: Debugging where each step reveals new information
No: Simple tasks with obvious solutions
No: Cost-sensitive operations (each loop = more tokens)

Implementation

Most agent frameworks default to ReAct-style loops. In Claude Code, the agent naturally alternates between reasoning (visible in thinking) and action (tool calls).

# Pseudo-code for ReAct loop
while not done:
    thought = llm.reason(observation)
    action = llm.select_action(thought)
    observation = execute(action)
    if llm.is_complete(observation):
        done = True

Limitations

Each reasoning step adds latency and cost. Errors in early observations can propagate through the entire chain. Cheap models struggle here because the whole pattern depends on the model actually reasoning well, not just generating plausible text.

Reflection: Self-Critique

Reflection adds a review step. The agent generates output, critiques it, then revises based on the critique. This catches errors that slip through single-pass generation.

The Pattern

Generate → Critique → Revise

The critique can be the same model reviewing its own work, a different model acting as reviewer, or explicit criteria the output must meet.

Example: Code Generation

[First attempt]
def calculate_average(numbers):
    return sum(numbers) / len(numbers)

[Critique]
- No handling for empty list (division by zero)
- No type hints
- No docstring

[Revision]
def calculate_average(numbers: list[float]) -> float:
    """Calculate arithmetic mean of a list of numbers."""
    if not numbers:
        raise ValueError("Cannot calculate average of empty list")
    return sum(numbers) / len(numbers)

When to Use Reflection

Yes: Writing that needs polish (articles, documentation)
Yes: Code that needs error handling and edge cases
Yes: Important outputs where quality matters more than speed
No: Simple lookups or factual responses
No: Time-critical operations

Implementation Approaches

Self-reflection: Same model, different prompt asking for critique.

Review this code for:
- Edge cases
- Error handling
- Performance issues
- Readability

Then provide an improved version.

External reflection: Second model or explicit rubric.

# Separate critic prompt
critic_response = llm.call(
    f"Rate this output 1-10 on these criteria: {criteria}\n\n{output}"
)
if critic_response.score < 8:
    revised = llm.call(f"Improve based on this feedback: {critic_response}")

Cost Trade-off

Reflection at least doubles your API bill. Use it selectively on outputs that matter, not on every response. A quick lookup doesn’t need self-critique. Code going to production probably does.

Planning: Task Decomposition

Planning breaks complex goals into manageable steps before execution. Instead of figuring things out as you go, you create a roadmap first.

When Planning Beats ReAct

ReAct works through problems step-by-step. Planning works out the steps in advance. Planning wins when:

The task has many interdependent parts
Early mistakes are expensive to undo
You need to allocate resources across steps
Parallel execution is possible

Decomposition Approaches

Full decomposition: Plan all steps before any execution.

Goal: Migrate database from MySQL to PostgreSQL

Steps:
1. Audit current schema
2. Map MySQL types to PostgreSQL equivalents
3. Generate migration scripts
4. Set up test environment
5. Run migration on test data
6. Validate data integrity
7. Plan production cutover
8. Execute migration
9. Verify and rollback plan ready

Interleaved decomposition: Plan some steps, execute, plan more based on results.

Goal: Research and implement caching layer

Phase 1 (research):
1. Identify slow queries
2. Measure current response times
3. Evaluate Redis vs Memcached

[Execute phase 1, then plan phase 2 based on findings]

Phase 2 (implementation):
4. Set up chosen solution
5. Implement cache-aside pattern
6. Add cache invalidation
7. Load test

Planning Prompts

For Claude Code, explicit planning prompts work well:

Before implementing, create a plan:

1. List all files that will need changes
2. Order changes by dependency
3. Identify any unknowns that need research first
4. Estimate complexity of each step

Then execute the plan step by step.

This aligns with the three-layer workflow: brainstorm spec, plan implementation, then execute.

When to Skip Planning

Task is obvious and well-understood
Exploration is the goal (you don’t know what you’re looking for)
Single-step operations

Tool Use: External Capabilities

Tool use extends what the agent can do. Instead of generating everything from memory, the agent calls external functions: search APIs, databases, calculators, code interpreters.

How Tool Use Works

Agent receives task
Agent decides which tool would help
Agent generates structured call (function name + arguments)
System executes the call
Agent receives result
Agent continues (possibly using more tools)

The agent doesn’t execute tools directly. It produces structured output describing the desired call, and the system executes it.

Tool Categories

Category	Examples	Use Case
Information	Web search, database queries	Current data, external knowledge
Computation	Calculator, code execution	Math, data processing
Actions	Send email, create file	Side effects
Integration	API calls, MCP servers	External services

Effective Tool Design

Clear descriptions: The agent selects tools based on descriptions. Vague descriptions lead to wrong tool choices.

{
  "name": "search_codebase",
  "description": "Search for code patterns across all files. Use for finding implementations, usages, or examples of specific functions, classes, or patterns.",
  "parameters": {
    "query": "regex pattern to search for",
    "file_type": "optional file extension filter"
  }
}

Specific parameters: Well-defined parameters reduce errors.

// Bad
"parameters": { "data": "the data to process" }

// Good
"parameters": {
  "amount": "numeric value in USD",
  "currency": "ISO 4217 currency code",
  "date": "YYYY-MM-DD format"
}

Error handling: Tools should return structured errors the agent can act on.

{
  "success": false,
  "error": "Rate limit exceeded",
  "retry_after_seconds": 60
}

Tool Use in Personal AI OS

For your personal AI setup, tool use connects Claude Code to your data and systems:

MCP servers expose databases, APIs, and services
Custom commands wrap complex operations
Browser agents interact with web applications

With enough tools connected, you stop asking for information and start getting things done.

Combining Patterns

Real agents mix patterns. A typical agent might:

Plan the overall approach
ReAct through execution, using tools at each step
Reflect on the final output before returning

Example: Research Report

[Planning]
1. Search for recent papers on topic
2. Summarize key findings from top 5
3. Identify common themes
4. Write synthesis

[ReAct + Tool Use]
Thought: Need recent papers
Action: search_arxiv("agentic AI patterns 2024-2025")
Observation: 12 results...

Thought: These three look most relevant
Action: fetch_paper(arxiv_id="2210.03629")
Observation: ReAct paper content...

[...more iterations...]

[Reflection]
Draft complete. Reviewing for:
- Accuracy of citations
- Logical flow
- Missing perspectives

Revision: Added counter-argument in section 3...

Pattern Selection Guide

Situation	Primary Pattern	Supporting Patterns
Unknown solution path	ReAct	Tool use
Quality-critical output	Reflection	-
Complex multi-step task	Planning	ReAct, Tool use
Data retrieval needed	Tool use	ReAct
Code generation	Reflection	Planning
Research synthesis	Planning	ReAct, Tool use, Reflection

Making Patterns Explicit in Claude Code

Claude Code already uses these patterns under the hood. You can force specific patterns through prompting:

Force Planning

Before making any changes:
1. List all files involved
2. Describe the change for each file
3. Identify the order of changes
4. Note any risks

Only proceed after I approve the plan.

Force Reflection

After completing the implementation:
1. Review the code for edge cases
2. Check error handling
3. Verify it matches the original requirements
4. Suggest any improvements

Make the improvements before marking complete.

Explicit ReAct

Think step by step. After each action:
1. State what you learned
2. Decide if you have enough information
3. If not, identify what's missing and how to get it

Common Mistakes

Mistake	Problem	Fix
ReAct for simple tasks	Unnecessary cost and latency	Direct execution
No reflection on important outputs	Preventable errors ship	Add review step
Planning too granularly	Rigid plans break on contact	Plan phases, not every step
Too many tools	Agent confused about which to use	Fewer, well-described tools
Reflection without criteria	Vague self-critique	Explicit rubric