ADR 0022 · Model routing architecture
- Status: accepted
- Date: 2026-05-23
Context
The agent makes provider calls for several distinct purposes — the main
conversational loop, summarization for compaction, embeddings for memory,
fast classification for routing decisions, sub-agent loops, etc. Today
those all run through the single Arc<dyn Provider> handed to the
Agent. Operators who want to use Sonnet for the main loop, Haiku for
summarization, and a local Ollama model for fast classification have no
clean way to express that.
Claude Code solves this with hardcoded getMainLoopModel /
getSmallFastModel helpers. That's fine for a single-vendor harness;
it's wrong for caliban, which is provider-agnostic by design. Operators
should be able to compose any model from any provider for any purpose
without recompiling.
A model router also turns out to be the natural home for several already-deferred concerns: per-route fallback chains, hedged requests, circuit breakers, cost/usage aggregation, and unification of the divergent prompt-cache surfaces across Anthropic, OpenAI, and Gemini.
This is signature differentiation for caliban; it deserves its own layer.
Decision
- Add a new Layer-3 crate
caliban-model-router. It sits betweencaliban-agent-coreand the fourcaliban-provider-*adapter crates. No agent-core code changes shape; the agent continues to take a singleArc<dyn Provider>. - The router IS a
Provider. It implements the same trait the adapters implement, so the agent sees one provider — the router — and the router internally dispatches eachcomplete/streamcall to the right downstreamProvider+ model based on the request's purpose, the operator's policy, and the capabilities the request needs. - Routes are matched by
RequestMetadata.purpose. A new field on the existingRequestMetadatastruct:purpose: Option<RequestPurpose>with variantsMainLoop | Summarization | Embedding | FastClassifier | SubAgent | Custom(String). Callers that don't set a purpose route through a default configured by the operator (likelyMainLoop). - Routing policy is operator-defined. A TOML config file plus a builder API. No auto-learning, no automatic cost optimization, no hidden behavior. The operator owns the cost / latency / capability trade-offs explicitly. This is a deliberate differentiator from Claude Code's hardcoded paths.
- Capability filtering is mandatory. Each route declares its
provider + model; the router consults
Provider::capabilities(model)before dispatch and skips a route whose capabilities don't satisfy the request (e.g. request needsToolUseCapability::ParallelCallsbut the route's model only supportsBasic). - Per-route fallback is opt-in and ordered. When the same
purposeappears in multiple[[route]]entries, the entries form a fallback chain in declaration order. The router tries them in sequence on a retryable failure of the previous entry (rate-limit, model unavailable, transient network error). Implementation is deferred to v2 — this ADR commits to the design. - Cost / usage aggregation is a router responsibility. The router
sees every call and every
Usage. It maintains a per-(provider, model)accumulator and exposes aRouterStatssnapshot for the TUI's existing/usageoverlay (ADR 0013) to render. - Hedging and circuit-breakers are router responsibilities. Both are sketched in the design spec but deferred to v2.
Consequences
- Agent constructor unchanged.
AgentBuilder::provider(...)takes the router as itsArc<dyn Provider>exactly like any adapter. No code incaliban-agent-coreknows the router exists. - Adapters stay simple. Per-adapter retry policy (existing
RetryPolicyfor transient errors) remains in the adapter. The router handles route-level fallback. The two layers compose: adapter retries within a route; router moves to the next route only if the adapter exhausts its retries with a fatal-for-this-route error. - Prompt-cache unification lands here. Anthropic's
cache_controlmarkers, OpenAI'scache_read_input_tokens, and Gemini's context-caching all surface as the sameUsage.cache_read_input_tokens/cache_creation_input_tokensvalues once they reach the router; the router is the natural place to normalize the bookkeeping. before_turnhook needs a way to see the resolved route. The agent'sTurnCtxcurrently exposesconfig.model, which is the caller's request, not the route's actual choice. A new optional field (or a router-supplied hook surface) is required so the TUI status line can display "Sonnet via Anthropic, fallback gpt-4o" instead of just the requested logical name. Detailed in the spec.- Sessions become route-history-aware. If a session was started on route A and resumes on route B (because the config changed, or the primary route is unavailable), prompt-cache markers from the prior provider are inert. The router documents this and falls back to no-cache for the transition turn.
- Forward links: hedged requests, circuit breakers, and adaptive
retry budgets were listed as non-goals in
2026-05-23-perf-baseline-design.md. This ADR pulls them under the router's umbrella for v2. - Revisit if: the operator-defined policy turns out to be a meaningful UX burden in practice (consider a "balanced" default policy), or if hedged requests prove valuable enough to promote from v2 to v1.
References
- Design spec:
docs/superpowers/specs/2026-05-23-model-router-design.md - Provider trait:
crates/caliban-provider/src/lib.rs - Capabilities:
crates/caliban-provider/src/capabilities.rs - Per-adapter retry: ADR 0009 (RetryPolicy)
- Usage overlay: ADR 0013 (TUI overlays)
- Perf-baseline non-goals:
docs/superpowers/specs/2026-05-23-perf-baseline-design.md