ClawPaneClawPane

How to Automatically Route AI Requests to the Cheapest Model

The cheapest model that produces acceptable output is the right model. Not the most expensive. Not the most popular. The cheapest one that works. Here's how to route every request to that model automatically.

The Principle: Minimum Viable Model

For any given request, there's a "minimum viable model" — the cheapest model that produces output of acceptable quality. Everything above that is waste.

A simple classification task has a minimum viable model around GPT-5-nano or Gemini 2.5 Flash. A complex legal analysis might require Claude Sonnet 4.5 or GPT-5. The key is matching each request to its minimum viable model dynamically.

Why You Can't Do This Manually

Manual model selection fails because:

  1. You can't predict complexity. The same agent gets simple and complex requests. You'd need per-request logic.
  2. Models and prices change. Today's cheapest option might not be cheapest next month.
  3. You'd need to maintain scoring logic. Which model handles classification best? Which is cheapest for summarization? This is a full-time job.
  4. Fallbacks add complexity. What if the cheap model is rate-limited? You need a secondary pick.

A router handles all of this automatically.

Setup: Cost-Optimized Routing in 5 Minutes

Step 1: Create a Cost-First Router

Go to ClawPane → Create Router and set weights:

Cost:    0.60   ← strongly prefer cheaper models
Quality: 0.25   ← but don't sacrifice too much quality
Latency: 0.10   ← speed is a secondary concern
Carbon:  0.05   ← minor weight for sustainability

Or simply choose the Economy preset, which uses similar weights.

Step 2: Add to OpenClaw

In OpenClaw Settings → Model Providers → Add Provider:

Provider Name:   ClawPane
Provider URL:    https://clawpane.co/route
API Key:         mp_xxxxxxxxxxxxxxxx
Model ID:        economy        # or your custom router ID
SDK:             OpenAI

Step 3: Assign to Agents

Point your agents at the ClawPane provider. For cost-first routing, use economy as the model ID. For different workloads, create separate routers:

  • Support agents → economy
  • Code agents → quality
  • Triage agents → fast

How the Router Picks the Cheapest Model

For each request, the router:

  1. Calculates expected cost for each available model (based on input tokens and estimated output)
  2. Checks quality threshold — models below a minimum quality score are excluded
  3. Checks availability — rate-limited or down models are skipped
  4. Selects the winner — cheapest model that passes quality and availability filters
  5. Prepares fallback — next cheapest model as backup

With cost weight at 0.60, the router heavily favors cheap models but won't select a model that scores below the quality floor. You get the cheapest acceptable model, not the cheapest bad model.

What the Routing Looks Like in Practice

For a support agent handling 1,000 requests/day:

Request TypeCountRouted ToCost/Request
Greetings / small talk200GPT-5-nano$0.00003
FAQ answers250GPT-5-mini$0.0005
Account lookups200Gemini 2.5 Flash$0.0002
Complex troubleshooting150GPT-5$0.006
Escalation summaries100Claude Haiku 4.5$0.0008
Multi-turn reasoning100Claude Sonnet 4.5$0.012

Daily cost: ~$2.25 vs. ~$5.00 if everything went through GPT-5. That's 55% savings.

Monitoring Cost-Optimized Routing

After enabling routing, check:

  • Model distribution chart — are cheap models handling the majority of traffic?
  • Cost per request trend — is average cost declining?
  • Quality feedback — are users reporting worse responses? (They usually aren't.)
  • Fallback rate — if fallbacks are high, a provider might be having issues

If quality complaints increase, bump the quality weight from 0.25 to 0.30. The router will shift more traffic to mid-tier models while still optimizing for cost.

The Savings Are Immediate

Unlike most infrastructure optimizations that require weeks of development, cost-optimized routing delivers savings from the first request. The setup takes 5 minutes. The ROI is instant.

Create a cost-optimized router →