Tool Routing: How AI Agents Pick Which Function to Call
Table of content
Modern agents don’t just call tools. They route between dozens of options, deciding which function handles a request based on context, capability, and cost.
The Routing Problem
When an agent has access to 50+ tools, every request becomes a classification problem. The agent must:
- Parse user intent
- Match intent to available capabilities
- Select the best tool (not just a valid one)
- Handle failures by falling back to alternatives
A naive approach treats this as a single LLM call. Better systems decompose routing into distinct phases.
Routing Patterns
Semantic Matching
The simplest approach: embed tool descriptions and user queries, then pick the closest match.
def route_by_similarity(query: str, tools: list[Tool]) -> Tool:
query_embedding = embed(query)
scores = [
cosine_similarity(query_embedding, embed(t.description))
for t in tools
]
return tools[argmax(scores)]
Limitations:
- Descriptions must be carefully written
- Fails when multiple tools have similar descriptions
- No consideration of tool state or availability
LLM-as-Router
Use the model itself to pick tools. The model sees tool schemas and decides which to call.
def route_with_llm(query: str, tools: list[Tool]) -> Tool:
tool_schemas = [t.schema for t in tools]
response = llm.complete(
system="Select the best tool for this request.",
user=query,
tools=tool_schemas
)
return response.tool_choice
This is how most agent frameworks work today. The model’s training includes tool-use examples, so it learns patterns like “search queries go to web_search” and “file operations go to file_manager.”
Trade-offs:
- More flexible than embedding similarity
- Burns tokens on every routing decision
- Model may hallucinate tools that don’t exist
Hierarchical Routing
Group tools into categories. Route to category first, then to specific tool.
User request
↓
Category Router → [search, file, code, web]
↓
Tool Router → specific tool within category
This scales better. Instead of comparing against 50 tools, you compare against 5 categories, then 10 tools within a category.
Azure Architecture Center documents this as the hierarchical orchestration pattern: a manager agent receives requests and delegates to specialist agents, each with their own tool sets.
Planning-Based Routing
For complex requests, plan first, route later.
User: "Research competitor pricing and update our spreadsheet"
Plan:
1. web_search("competitor pricing {company}")
2. extract_data(search_results)
3. sheets_api.update(spreadsheet_id, data)
Execute: Run tools in sequence
This separates “what to do” from “how to do it.” The planner generates a DAG of operations. The executor routes each step to the appropriate tool.
LangGraph builds on this pattern with graph-based workflows where nodes represent tools or agents and edges represent control flow.
Scoring and Selection
When multiple tools could handle a request, scoring determines the winner.
Capability Scoring
Rate each tool’s ability to handle the specific request:
| Factor | Weight | Example |
|---|---|---|
| Semantic match | 0.4 | Does the tool description match? |
| Input compatibility | 0.3 | Can the tool accept these parameters? |
| Historical success | 0.2 | How often has this pairing worked? |
| Resource cost | 0.1 | Token usage, API costs, latency |
Dynamic Weighting
Weights change based on context. In a cost-constrained environment, resource cost dominates. In a time-critical flow, latency matters more than thoroughness.
def score_tool(tool: Tool, query: str, context: Context) -> float:
base_score = semantic_match(tool, query)
if context.budget_remaining < 100:
base_score *= tool.cost_efficiency
if context.deadline_seconds < 30:
base_score *= tool.speed_rating
return base_score
Fallback Patterns
Routing fails. When it does, you need a recovery plan. Three patterns:
Retry with Backoff
Same tool, different parameters:
web_search("pricing") → timeout
web_search("pricing", timeout=30) → timeout
web_search("pricing", timeout=60) → success
Fallback Chain
Different tools, same intent:
primary_search → rate_limited
secondary_search → success
Define fallback chains upfront:
fallbacks = {
"web_search": ["bing_search", "duckduckgo_search"],
"code_interpreter": ["local_python", "cloud_sandbox"],
}
Graceful Degradation
When no tool can satisfy the full request, satisfy part of it:
User: "Get live stock price for AAPL"
Tool available: historical_data (no live prices)
Response: "I can't get live prices, but here's the last close: $XXX"
MCP Sampling
The Model Context Protocol (MCP) introduces sampling: servers can request completions from the client’s LLM rather than embedding their own.
This inverts the typical pattern. Instead of the client routing to tools, tools can request AI capabilities from the client:
Traditional: Client → picks tool → calls tool
MCP Sampling: Tool → requests completion → client's LLM responds
Why this matters for personal systems: you control which model handles each request. A tool server doesn’t need its own API keys or model access. It asks your client, and your client decides which model to use based on your preferences and budget.
Multi-Agent Routing
In multi-agent systems, routing happens at two levels:
- Agent selection - Which agent handles this request?
- Tool selection - Which tool does that agent use?
The agent handoff pattern handles cases where you don’t know which agent fits best until runtime. An orchestrator routes to specialists as requirements emerge.
CrewAI uses role-based routing: define agents with specific roles (Researcher, Writer, Analyst), and the framework routes tasks based on role fit.
LangGraph uses graph-based routing: define nodes (agents or tools) and edges (conditional transitions), then traverse the graph based on state.
Both hit production scale. LangGraph serves 6+ million monthly downloads with its 1.0 release. CrewAI has 33k GitHub stars and explicit enterprise support including HIPAA/SOC2 compliance.
Implementing Routing
Minimal Example
Start simple. A router that handles three categories:
def route(query: str, tools: dict[str, list[Tool]]) -> Tool:
# Step 1: Classify intent
category = classify(query, categories=["search", "file", "code"])
# Step 2: Score tools in category
candidates = tools[category]
scores = [score_tool(t, query) for t in candidates]
# Step 3: Select best, with fallback
primary = candidates[argmax(scores)]
fallback = candidates[argsort(scores)[-2]] if len(candidates) > 1 else None
return primary, fallback
Production Considerations
| Concern | Solution |
|---|---|
| Latency | Cache embeddings, pre-compute tool scores for common queries |
| Cost | Use smaller models for routing, reserve large models for execution |
| Observability | Log every routing decision for debugging |
| Drift | Monitor tool selection accuracy over time |
Routing in Personal AI Systems
For a personal OS, tool routing determines which capabilities you can access and how much they cost.
The default strategy: local first, cloud fallback. Local tools have lower latency, no API costs, and work offline. Cloud tools handle what local can’t.
Over time, your router should learn your patterns. Weight tools that work well for you. Route expensive operations only when cheap ones fail.
Your personal tool inventory might include:
| Category | Local | Cloud Fallback |
|---|---|---|
| Search | local RAG | Perplexity API |
| Code | local interpreter | cloud sandbox |
| Files | filesystem | cloud storage |
| Web | cached pages | live fetch |
Local first. Cloud when you need it.
Next: Subagent Patterns
Get updates
New guides, workflows, and AI patterns. No spam.
Thank you! You're on the list.