Tool Use Patterns: How LLMs Call External Tools

Table of content

An LLM without tools is a librarian locked inside the library. Smart, well-read, but can’t check current stock prices or send emails. Tools unlock real capability.

Three patterns dominate how models call external functions: function calling (direct), MCP (standardized), and ReAct (reasoning-driven). Each solves different problems.

The Three Approaches

Pattern	How It Works	Best For
Function Calling	Model outputs structured JSON to invoke functions	Single-turn tool use, APIs
MCP	Protocol standard for tool discovery and execution	Multi-tool ecosystems
ReAct	Reasoning traces interleaved with tool calls	Complex problem-solving

Function Calling: The Foundation

Function calling lets models output structured tool invocations instead of plain text. The model receives tool definitions, decides when to call them, and formats the parameters.

How It Works

You define available tools with JSON schemas
Model decides whether to call a tool or respond directly
If calling a tool, model outputs function name + arguments
Your code executes the function, returns result
Model incorporates result into final response

Example: Weather Tool

Define the tool:

{
  "name": "get_weather",
  "description": "Get current weather for a location",
  "parameters": {
    "type": "object",
    "properties": {
      "location": {
        "type": "string",
        "description": "City and country, e.g. 'London, UK'"
      }
    },
    "required": ["location"]
  }
}

The model might respond:

{
  "tool_calls": [{
    "name": "get_weather",
    "arguments": {"location": "Tokyo, Japan"}
  }]
}

Your code runs the function, sends result back, model synthesizes answer.

When Function Calling Works

Good matches:

Retrieving external data (search, databases, APIs)
Taking actions (send email, create file, book appointment)
Calculations that need precision

Poor matches:

Creative tasks (don’t need external data)
Simple Q&A within model knowledge
Tasks requiring multi-step orchestration (use ReAct instead)

Provider Implementations

Every major provider supports function calling with slight variations:

Anthropic: Tool use via tools parameter in Messages API
OpenAI: Function calling with tool_choice controls
Google: Grounding with function declarations

Core mechanic is identical. Schemas differ in edge cases.

MCP: The USB-C of AI Tools

Model Context Protocol (MCP) standardizes how AI apps connect to external systems. Instead of each integration being custom, MCP provides a common interface.

From the MCP spec:

Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect devices, MCP provides a standardized way to connect AI applications to external systems.

Architecture

┌─────────────┐     MCP Protocol     ┌─────────────┐
│   AI App    │ ◄──────────────────► │  MCP Server │
│  (Claude)   │                      │  (Your DB)  │
└─────────────┘                      └─────────────┘

The AI app is the client. External systems expose themselves as servers. The protocol defines:

Tool discovery (what can I do?)
Tool invocation (do this thing)
Resource access (get this data)
Prompts (suggested interactions)

Why MCP Matters

Without MCP, connecting Claude to your database requires custom code. Connecting to Slack requires different custom code. Each integration is bespoke.

With MCP, one integration pattern works everywhere. Build an MCP server once, any MCP client can use it. The ecosystem grows through composition.

Example Server

A minimal MCP server exposing a search tool:

from mcp.server import Server

server = Server("search-server")

@server.tool()
async def search_docs(query: str) -> str:
    """Search documentation for relevant content"""
    results = my_search_function(query)
    return format_results(results)

server.run()

Any MCP-compatible client can now discover and call this tool.

Current State

MCP is new (late 2024). Anthropic created it, but it’s open source. Claude Desktop supports it natively. Other clients are adopting it. The spec continues to evolve.

For building tools today:

Use MCP if you want future compatibility
Use direct function calling for simpler integrations
Both coexist—MCP servers ultimately expose function-like interfaces

ReAct: Thinking While Doing

ReAct (Reasoning and Acting) differs from pure function calling. Instead of just invoking tools, the model explicitly reasons about what to do and why before each action.

The original paper from Yao et al. showed this interleaving beats both pure reasoning and pure acting:

Thought: I need to find the current Bitcoin price.
         My training data is outdated. I'll search for live data.

Action: search("Bitcoin price USD current")

Observation: Results show $67,432 as of today.

Thought: I have current data. The user asked for the price.
         I can now answer directly.

Answer: Bitcoin is currently at $67,432 USD.

Why Reasoning Traces Help

The explicit thought step:

Grounds decisions: Model explains why it’s calling a tool
Catches errors: “Wait, I already have this information” prevents wasted calls
Enables recovery: Wrong tool? Reasoning reveals the mistake
Improves interpretability: You see the agent’s logic

ReAct vs Function Calling

They’re not mutually exclusive. ReAct describes the pattern of reasoning between actions. Function calling is the mechanism for executing actions.

In practice:

# Pure function calling: model just outputs tool calls
response = model.chat(messages, tools=tools)
# Model decides to call search() - but why?

# ReAct style: reasoning visible
response = model.chat(
    messages + [{"role": "user", "content": 
        "Think step by step about what tool to use and why."}],
    tools=tools
)
# Model: "I need current data because... so I'll call search()"

Modern agents blend both. Claude Code, for instance, reasons in its thinking blocks then executes tool calls—ReAct under the hood.

Trade-offs

ReAct costs more tokens (reasoning traces aren’t free). For simple tasks, direct function calling suffices. For complex multi-step problems, the reasoning overhead pays for itself in fewer mistakes.

Implementing Tool Use

Define Clear Schemas

Vague descriptions lead to wrong tool calls:

// Bad
{"name": "search", "description": "Search things"}

// Good  
{
  "name": "search_knowledge_base",
  "description": "Search internal documentation. Use for questions about company policies, product specs, or procedures. Returns top 5 relevant snippets.",
  "parameters": {
    "query": {
      "type": "string",
      "description": "Natural language search query. Be specific."
    }
  }
}

Handle Failures Gracefully

Tools fail. APIs timeout. Results come back empty.

def call_tool_safely(tool_name, args):
    try:
        result = execute_tool(tool_name, args)
        return {"status": "success", "data": result}
    except TimeoutError:
        return {"status": "error", "message": "Tool timed out. Try again or use alternative."}
    except Exception as e:
        return {"status": "error", "message": str(e)}

Return structured errors. The model can adapt—retry, try different tool, or explain the limitation.

Limit Tool Count

More tools ≠ better. Models get confused with 50+ options.

5-10 tools: Works reliably
10-20 tools: Needs good descriptions
20+ tools: Consider tool routing or hierarchical selection

For large toolsets, use a “meta-tool” that selects which specialized tools to expose:

@tool
def select_tools(task_category: str) -> list:
    """Based on task type, returns relevant tools"""
    tool_sets = {
        "data_analysis": [query_db, run_python, create_chart],
        "communication": [send_email, post_slack, schedule_meeting],
        "research": [web_search, read_pdf, summarize]
    }
    return tool_sets.get(task_category, [])

Test Tool Interactions

Unit test tools independently. Integration test tool chains.

def test_search_returns_results():
    result = search_knowledge_base("refund policy")
    assert result["status"] == "success"
    assert len(result["data"]) > 0

def test_model_uses_search_for_policy_questions():
    response = model.chat([
        {"role": "user", "content": "What's our refund policy?"}
    ], tools=[search_knowledge_base])
    assert response.tool_calls[0].name == "search_knowledge_base"

Emerging Patterns

Instead of one model with many tools, multiple specialized agents share a tool pool. A coordinator routes tasks to the right agent, who accesses relevant tools.

User Query
    │
    ▼
┌─────────────┐
│ Coordinator │
└─────────────┘
    │         │
    ▼         ▼
┌───────┐ ┌───────┐
│ Agent │ │ Agent │
│ (DB)  │ │ (Web) │
└───────┘ └───────┘
    │         │
    ▼         ▼
[DB Tools] [Web Tools]

Tool Learning

Some systems let models learn new tools from examples or documentation, rather than requiring explicit schemas. Still experimental, but reduces integration burden.

Sandboxed Execution

For code execution tools, sandboxing matters. E2B, Modal, and similar platforms provide isolated environments where agent-generated code runs safely.

What You Can Steal

Start simple: One well-defined tool beats ten vague ones. Add tools as you discover real needs.

Make errors informative: “Tool failed” is useless. “Search returned no results for ‘xylophone policy’—try broader terms” helps the model recover.

Log tool calls: Every invocation should be traceable. You’ll need this for debugging and cost tracking.

Consider MCP early: If you’re building tools that others might use, MCP compatibility future-proofs your work.

Use ReAct for hard problems: Simple lookups don’t need reasoning traces. Multi-step research does. Match pattern to problem.

Next: Agentic Design Patterns covers the broader context of how tools fit into agent architectures.