Tool Routing: How AI Agents Pick Which Function to Call

Table of content

Modern agents don’t just call tools. They route between dozens of options, deciding which function handles a request based on context, capability, and cost.

The Routing Problem

When an agent has access to 50+ tools, every request becomes a classification problem. The agent must:

  1. Parse user intent
  2. Match intent to available capabilities
  3. Select the best tool (not just a valid one)
  4. Handle failures by falling back to alternatives

A naive approach treats this as a single LLM call. Better systems decompose routing into distinct phases.

Routing Patterns

Semantic Matching

The simplest approach: embed tool descriptions and user queries, then pick the closest match.

def route_by_similarity(query: str, tools: list[Tool]) -> Tool:
    query_embedding = embed(query)
    scores = [
        cosine_similarity(query_embedding, embed(t.description))
        for t in tools
    ]
    return tools[argmax(scores)]

Limitations:

LLM-as-Router

Use the model itself to pick tools. The model sees tool schemas and decides which to call.

def route_with_llm(query: str, tools: list[Tool]) -> Tool:
    tool_schemas = [t.schema for t in tools]
    response = llm.complete(
        system="Select the best tool for this request.",
        user=query,
        tools=tool_schemas
    )
    return response.tool_choice

This is how most agent frameworks work today. The model’s training includes tool-use examples, so it learns patterns like “search queries go to web_search” and “file operations go to file_manager.”

Trade-offs:

Hierarchical Routing

Group tools into categories. Route to category first, then to specific tool.

User request
    
Category Router  [search, file, code, web]
    
Tool Router  specific tool within category

This scales better. Instead of comparing against 50 tools, you compare against 5 categories, then 10 tools within a category.

Azure Architecture Center documents this as the hierarchical orchestration pattern: a manager agent receives requests and delegates to specialist agents, each with their own tool sets.

Planning-Based Routing

For complex requests, plan first, route later.

User: "Research competitor pricing and update our spreadsheet"

Plan:
1. web_search("competitor pricing {company}")
2. extract_data(search_results)
3. sheets_api.update(spreadsheet_id, data)

Execute: Run tools in sequence

This separates “what to do” from “how to do it.” The planner generates a DAG of operations. The executor routes each step to the appropriate tool.

LangGraph builds on this pattern with graph-based workflows where nodes represent tools or agents and edges represent control flow.

Scoring and Selection

When multiple tools could handle a request, scoring determines the winner.

Capability Scoring

Rate each tool’s ability to handle the specific request:

FactorWeightExample
Semantic match0.4Does the tool description match?
Input compatibility0.3Can the tool accept these parameters?
Historical success0.2How often has this pairing worked?
Resource cost0.1Token usage, API costs, latency

Dynamic Weighting

Weights change based on context. In a cost-constrained environment, resource cost dominates. In a time-critical flow, latency matters more than thoroughness.

def score_tool(tool: Tool, query: str, context: Context) -> float:
    base_score = semantic_match(tool, query)

    if context.budget_remaining < 100:
        base_score *= tool.cost_efficiency

    if context.deadline_seconds < 30:
        base_score *= tool.speed_rating

    return base_score

Fallback Patterns

Routing fails. When it does, you need a recovery plan. Three patterns:

Retry with Backoff

Same tool, different parameters:

web_search("pricing") → timeout
web_search("pricing", timeout=30) → timeout
web_search("pricing", timeout=60) → success

Fallback Chain

Different tools, same intent:

primary_search → rate_limited
secondary_search → success

Define fallback chains upfront:

fallbacks = {
    "web_search": ["bing_search", "duckduckgo_search"],
    "code_interpreter": ["local_python", "cloud_sandbox"],
}

Graceful Degradation

When no tool can satisfy the full request, satisfy part of it:

User: "Get live stock price for AAPL"

Tool available: historical_data (no live prices)

Response: "I can't get live prices, but here's the last close: $XXX"

MCP Sampling

The Model Context Protocol (MCP) introduces sampling: servers can request completions from the client’s LLM rather than embedding their own.

This inverts the typical pattern. Instead of the client routing to tools, tools can request AI capabilities from the client:

Traditional: Client  picks tool  calls tool
MCP Sampling: Tool  requests completion  client's LLM responds

Why this matters for personal systems: you control which model handles each request. A tool server doesn’t need its own API keys or model access. It asks your client, and your client decides which model to use based on your preferences and budget.

Multi-Agent Routing

In multi-agent systems, routing happens at two levels:

  1. Agent selection - Which agent handles this request?
  2. Tool selection - Which tool does that agent use?

The agent handoff pattern handles cases where you don’t know which agent fits best until runtime. An orchestrator routes to specialists as requirements emerge.

CrewAI uses role-based routing: define agents with specific roles (Researcher, Writer, Analyst), and the framework routes tasks based on role fit.

LangGraph uses graph-based routing: define nodes (agents or tools) and edges (conditional transitions), then traverse the graph based on state.

Both hit production scale. LangGraph serves 6+ million monthly downloads with its 1.0 release. CrewAI has 33k GitHub stars and explicit enterprise support including HIPAA/SOC2 compliance.

Implementing Routing

Minimal Example

Start simple. A router that handles three categories:

def route(query: str, tools: dict[str, list[Tool]]) -> Tool:
    # Step 1: Classify intent
    category = classify(query, categories=["search", "file", "code"])

    # Step 2: Score tools in category
    candidates = tools[category]
    scores = [score_tool(t, query) for t in candidates]

    # Step 3: Select best, with fallback
    primary = candidates[argmax(scores)]
    fallback = candidates[argsort(scores)[-2]] if len(candidates) > 1 else None

    return primary, fallback

Production Considerations

ConcernSolution
LatencyCache embeddings, pre-compute tool scores for common queries
CostUse smaller models for routing, reserve large models for execution
ObservabilityLog every routing decision for debugging
DriftMonitor tool selection accuracy over time

Routing in Personal AI Systems

For a personal OS, tool routing determines which capabilities you can access and how much they cost.

The default strategy: local first, cloud fallback. Local tools have lower latency, no API costs, and work offline. Cloud tools handle what local can’t.

Over time, your router should learn your patterns. Weight tools that work well for you. Route expensive operations only when cheap ones fail.

Your personal tool inventory might include:

CategoryLocalCloud Fallback
Searchlocal RAGPerplexity API
Codelocal interpretercloud sandbox
Filesfilesystemcloud storage
Webcached pageslive fetch

Local first. Cloud when you need it.


Next: Subagent Patterns

Topics: ai-agents mcp architecture