How to Automatically Route AI Requests to the Cheapest Model
The cheapest model that produces acceptable output is the right model. Not the most expensive. Not the most popular. The cheapest one that works. Here's how to route every request to that model automatically.
The Principle: Minimum Viable Model
For any given request, there's a "minimum viable model" — the cheapest model that produces output of acceptable quality. Everything above that is waste.
A simple classification task has a minimum viable model around GPT-5-nano or Gemini 2.5 Flash. A complex legal analysis might require Claude Sonnet 4.5 or GPT-5. The key is matching each request to its minimum viable model dynamically.
Why You Can't Do This Manually
Manual model selection fails because:
- You can't predict complexity. The same agent gets simple and complex requests. You'd need per-request logic.
- Models and prices change. Today's cheapest option might not be cheapest next month.
- You'd need to maintain scoring logic. Which model handles classification best? Which is cheapest for summarization? This is a full-time job.
- Fallbacks add complexity. What if the cheap model is rate-limited? You need a secondary pick.
A router handles all of this automatically.
Setup: Cost-Optimized Routing in 5 Minutes
Step 1: Create a Cost-First Router
Go to ClawPane → Create Router and set weights:
Cost: 0.60 ← strongly prefer cheaper models
Quality: 0.25 ← but don't sacrifice too much quality
Latency: 0.10 ← speed is a secondary concern
Carbon: 0.05 ← minor weight for sustainability
Or simply choose the Economy preset, which uses similar weights.
Step 2: Add to OpenClaw
In OpenClaw Settings → Model Providers → Add Provider:
Provider Name: ClawPane
Provider URL: https://clawpane.co/route
API Key: mp_xxxxxxxxxxxxxxxx
Model ID: economy # or your custom router ID
SDK: OpenAI
Step 3: Assign to Agents
Point your agents at the ClawPane provider. For cost-first routing, use economy as the model ID. For different workloads, create separate routers:
- Support agents →
economy - Code agents →
quality - Triage agents →
fast
How the Router Picks the Cheapest Model
For each request, the router:
- Calculates expected cost for each available model (based on input tokens and estimated output)
- Checks quality threshold — models below a minimum quality score are excluded
- Checks availability — rate-limited or down models are skipped
- Selects the winner — cheapest model that passes quality and availability filters
- Prepares fallback — next cheapest model as backup
With cost weight at 0.60, the router heavily favors cheap models but won't select a model that scores below the quality floor. You get the cheapest acceptable model, not the cheapest bad model.
What the Routing Looks Like in Practice
For a support agent handling 1,000 requests/day:
| Request Type | Count | Routed To | Cost/Request |
|---|---|---|---|
| Greetings / small talk | 200 | GPT-5-nano | $0.00003 |
| FAQ answers | 250 | GPT-5-mini | $0.0005 |
| Account lookups | 200 | Gemini 2.5 Flash | $0.0002 |
| Complex troubleshooting | 150 | GPT-5 | $0.006 |
| Escalation summaries | 100 | Claude Haiku 4.5 | $0.0008 |
| Multi-turn reasoning | 100 | Claude Sonnet 4.5 | $0.012 |
Daily cost: ~$2.25 vs. ~$5.00 if everything went through GPT-5. That's 55% savings.
Monitoring Cost-Optimized Routing
After enabling routing, check:
- Model distribution chart — are cheap models handling the majority of traffic?
- Cost per request trend — is average cost declining?
- Quality feedback — are users reporting worse responses? (They usually aren't.)
- Fallback rate — if fallbacks are high, a provider might be having issues
If quality complaints increase, bump the quality weight from 0.25 to 0.30. The router will shift more traffic to mid-tier models while still optimizing for cost.
The Savings Are Immediate
Unlike most infrastructure optimizations that require weeks of development, cost-optimized routing delivers savings from the first request. The setup takes 5 minutes. The ROI is instant.