Tool Routing: How AI Agents Pick Which Function to Call

Table of content

Modern agents don’t just call tools. They route between dozens of options, deciding which function handles a request based on context, capability, and cost.

The Routing Problem

When an agent has access to 50+ tools, every request becomes a classification problem. The agent must:

Parse user intent
Match intent to available capabilities
Select the best tool (not just a valid one)
Handle failures by falling back to alternatives

A naive approach treats this as a single LLM call. Better systems decompose routing into distinct phases.

Routing Patterns

Semantic Matching

The simplest approach: embed tool descriptions and user queries, then pick the closest match.

def route_by_similarity(query: str, tools: list[Tool]) -> Tool:
    query_embedding = embed(query)
    scores = [
        cosine_similarity(query_embedding, embed(t.description))
        for t in tools
    ]
    return tools[argmax(scores)]

Limitations:

Descriptions must be carefully written
Fails when multiple tools have similar descriptions
No consideration of tool state or availability

LLM-as-Router

Use the model itself to pick tools. The model sees tool schemas and decides which to call.

def route_with_llm(query: str, tools: list[Tool]) -> Tool:
    tool_schemas = [t.schema for t in tools]
    response = llm.complete(
        system="Select the best tool for this request.",
        user=query,
        tools=tool_schemas
    )
    return response.tool_choice

This is how most agent frameworks work today. The model’s training includes tool-use examples, so it learns patterns like “search queries go to web_search” and “file operations go to file_manager.”

Trade-offs:

More flexible than embedding similarity
Burns tokens on every routing decision
Model may hallucinate tools that don’t exist

Hierarchical Routing

Group tools into categories. Route to category first, then to specific tool.

User request
    ↓
Category Router → [search, file, code, web]
    ↓
Tool Router → specific tool within category

This scales better. Instead of comparing against 50 tools, you compare against 5 categories, then 10 tools within a category.

Azure Architecture Center documents this as the hierarchical orchestration pattern: a manager agent receives requests and delegates to specialist agents, each with their own tool sets.

Planning-Based Routing

For complex requests, plan first, route later.

User: "Research competitor pricing and update our spreadsheet"

Plan:
1. web_search("competitor pricing {company}")
2. extract_data(search_results)
3. sheets_api.update(spreadsheet_id, data)

Execute: Run tools in sequence

This separates “what to do” from “how to do it.” The planner generates a DAG of operations. The executor routes each step to the appropriate tool.

LangGraph builds on this pattern with graph-based workflows where nodes represent tools or agents and edges represent control flow.

Scoring and Selection

When multiple tools could handle a request, scoring determines the winner.

Capability Scoring

Rate each tool’s ability to handle the specific request:

Factor	Weight	Example
Semantic match	0.4	Does the tool description match?
Input compatibility	0.3	Can the tool accept these parameters?
Historical success	0.2	How often has this pairing worked?
Resource cost	0.1	Token usage, API costs, latency

Dynamic Weighting

Weights change based on context. In a cost-constrained environment, resource cost dominates. In a time-critical flow, latency matters more than thoroughness.

def score_tool(tool: Tool, query: str, context: Context) -> float:
    base_score = semantic_match(tool, query)

    if context.budget_remaining < 100:
        base_score *= tool.cost_efficiency

    if context.deadline_seconds < 30:
        base_score *= tool.speed_rating

    return base_score

Fallback Patterns

Routing fails. When it does, you need a recovery plan. Three patterns:

Retry with Backoff

Same tool, different parameters:

web_search("pricing") → timeout
web_search("pricing", timeout=30) → timeout
web_search("pricing", timeout=60) → success

Fallback Chain

Different tools, same intent:

primary_search → rate_limited
secondary_search → success

Define fallback chains upfront:

fallbacks = {
    "web_search": ["bing_search", "duckduckgo_search"],
    "code_interpreter": ["local_python", "cloud_sandbox"],
}

Graceful Degradation

When no tool can satisfy the full request, satisfy part of it:

User: "Get live stock price for AAPL"

Tool available: historical_data (no live prices)

Response: "I can't get live prices, but here's the last close: $XXX"

MCP Sampling

The Model Context Protocol (MCP) introduces sampling: servers can request completions from the client’s LLM rather than embedding their own.

This inverts the typical pattern. Instead of the client routing to tools, tools can request AI capabilities from the client:

Traditional: Client → picks tool → calls tool
MCP Sampling: Tool → requests completion → client's LLM responds

Why this matters for personal systems: you control which model handles each request. A tool server doesn’t need its own API keys or model access. It asks your client, and your client decides which model to use based on your preferences and budget.

Multi-Agent Routing

In multi-agent systems, routing happens at two levels:

Agent selection - Which agent handles this request?
Tool selection - Which tool does that agent use?

The agent handoff pattern handles cases where you don’t know which agent fits best until runtime. An orchestrator routes to specialists as requirements emerge.

CrewAI uses role-based routing: define agents with specific roles (Researcher, Writer, Analyst), and the framework routes tasks based on role fit.

LangGraph uses graph-based routing: define nodes (agents or tools) and edges (conditional transitions), then traverse the graph based on state.

Both hit production scale. LangGraph serves 6+ million monthly downloads with its 1.0 release. CrewAI has 33k GitHub stars and explicit enterprise support including HIPAA/SOC2 compliance.

Implementing Routing

Minimal Example

Start simple. A router that handles three categories:

def route(query: str, tools: dict[str, list[Tool]]) -> Tool:
    # Step 1: Classify intent
    category = classify(query, categories=["search", "file", "code"])

    # Step 2: Score tools in category
    candidates = tools[category]
    scores = [score_tool(t, query) for t in candidates]

    # Step 3: Select best, with fallback
    primary = candidates[argmax(scores)]
    fallback = candidates[argsort(scores)[-2]] if len(candidates) > 1 else None

    return primary, fallback

Production Considerations

Concern	Solution
Latency	Cache embeddings, pre-compute tool scores for common queries
Cost	Use smaller models for routing, reserve large models for execution
Observability	Log every routing decision for debugging
Drift	Monitor tool selection accuracy over time

Routing in Personal AI Systems

For a personal OS, tool routing determines which capabilities you can access and how much they cost.

The default strategy: local first, cloud fallback. Local tools have lower latency, no API costs, and work offline. Cloud tools handle what local can’t.

Over time, your router should learn your patterns. Weight tools that work well for you. Route expensive operations only when cheap ones fail.

Your personal tool inventory might include:

Category	Local	Cloud Fallback
Search	local RAG	Perplexity API
Code	local interpreter	cloud sandbox
Files	filesystem	cloud storage
Web	cached pages	live fetch

Local first. Cloud when you need it.

Next: Subagent Patterns