Caliban User Guide
Caliban is a Rust-native, provider-agnostic AI agent harness that puts you in control
of model routing, memory, permissions, and prompt context.
This guide is for users who run caliban day-to-day and operators who deploy
and configure it for a team or homelab; it describes behavior and workflows, not Rust internals.
How this guide is organized
| Part | What it covers |
|---|---|
| Introduction | What Caliban is, why it exists, and current project status |
| Getting Started | Installation & building, your first session, the interactive TUI, and headless basics |
| Providers & Models | Supported providers, API key setup, model selection, and the model router |
| Configuration | Settings layering across four scopes, file locations, and the full settings reference |
| Permissions | Core concepts, the pattern grammar, permission modes, and rule management |
| Reference | CLI flags, settings schema, slash command index, environment variables, and file paths |
Caliban v0.1.0 is a pre-release. The core feature set is daily-usable on main under
AGPL-3.0. See Project Status for what is
shipped versus planned.
What Is Caliban?
Caliban is an AI agent harness: a CLI that drives one or more language models through a structured loop of prompts, tool calls, and responses while managing sessions, permissions, memory, and extensibility around that loop. It is provider-agnostic — the same harness works with Anthropic Claude (direct, Bedrock, Vertex), OpenAI (direct, Azure), Google Gemini (AI Studio, Vertex), and local Ollama, all through a common internal representation.
Capabilities at a glance
| Capability | What it gives you | Where to learn more |
|---|---|---|
| Interactive TUI | Full-screen terminal UI with transcript, status bar, slash-menu, file picker, and permission modals | The Interactive TUI |
| Headless / print mode | One-shot -p flag for scripting; stream-json protocol for machine-readable output | Headless Basics |
| Persistent sessions | Named sessions saved to disk; resume across invocations with --resume or --continue | Sessions & Persistence |
| Permissions | Rule-based gate on every tool call; six modes from default to bypassPermissions; audit log | Permissions Concepts |
| Built-in tools | Read, Write, Edit, MultiEdit, Glob, Grep, Bash, BashBg, WebFetch, WebSearch, NotebookEdit, TodoWrite, and more | Built-in Tools |
| MCP client | Connect external tool servers over stdio or HTTP; OAuth; per-server permission scoping | MCP Servers |
| Sub-agents | In-process agent calls, background agents via caliband, git-worktree isolation | Sub-agents |
| Memory tiers | Global, project, and auto-memory via CLAUDE.md ancestry and @-imports | Memory Tiers |
| Model router | Declarative routes per purpose (MainLoop, Compaction, …); fallback chains; circuit breakers | The Model Router |
| Plugins, hooks & skills | Bundle capabilities as plugins; hook lifecycle events; load skill files for slash commands | Extending Caliban |
Because Caliban normalizes all providers to a single internal IR, you can switch models or
providers with a single flag (--provider, --model) or a caliban.toml router config,
without changing your workflow.
The agent loop
At its core, Caliban runs a streaming agent loop:
flowchart LR
U[User prompt] --> A[Agent loop]
A --> M[Model — streams response]
M -->|tool_use blocks| T[Tool dispatch]
T -->|tool results| M
M -->|stop| O[Response shown to user]
Each turn streams from the model as it arrives; tool calls are dispatched as they appear and their results fed back until the model produces a final text response. The loop runs identically in TUI, headless, and library contexts.
Philosophy
Caliban exists because the dominant AI agent CLIs are tightly coupled to a single provider and leave operators with little control over what the model sees, what it can do, or where state is stored. The design is a direct response to those constraints.
Operator control
You decide what model handles each task, what context goes into the prompt, and which tools
the model is allowed to call. Routing is declarative (caliban.toml); settings layer at
four scopes (managed, user, project, local) with deep-merge semantics; permissions are
first-class and auditable. Nothing is hardwired to a service the operator does not control.
Provider-agnostic
No SDK lock-in. Anthropic Claude, OpenAI, Google Gemini, and local Ollama all speak the same internal representation inside Caliban. Cloud transports (AWS Bedrock, Google Vertex, Azure OpenAI) are cargo-feature-gated and additive — the core binary has no mandatory cloud dependency. Switching providers is a flag, not a rewrite.
Local-first and data sovereignty
Sessions, checkpoints, auto-memory, and tool-result overflows live on your disk by default.
Caliban is designed to run in a self-hosted homelab: no required cloud account, no telemetry
unless you opt in (CALIBAN_ENABLE_TELEMETRY=1), no state sent anywhere you do not control.
AGPL-3.0 transparency
Caliban is licensed under AGPL-3.0-only. If you modify Caliban and run it as a network service or distribute the binary, you must release your changes under the same license. This closes the "SaaS loophole" that GPL-3.0 leaves open, aligning with projects like Mastodon and Nextcloud that use AGPL to keep improvements in the commons. Personal use is unaffected. The full rationale is in ADR 0003.
Rust performance
Harness overhead should be negligible compared to model latency. The time-to-result you experience is dominated by the model, not the runtime. This is not a feature worth advertising loudly — it is a baseline expectation for a tool that runs constantly in the background.
Caliban is a terminal agent harness, not an IDE extension, a cloud service, or a mobile app. IDE integration, GitHub App, and remote-control surfaces are tracked in the parity matrix (theme N) but are explicitly parked until the terminal/CLI feature set reaches parity with Claude Code. The guide does not document planned features as if they were shipped.
Project Status
Caliban v0.1.0 is a pre-release. The binary (caliban) is daily-usable from main; the
core agent loop, TUI, headless mode, sessions, permissions, tools, MCP, sub-agents, memory,
sandbox, and telemetry are all shipped. A number of parity gaps with Claude Code remain.
What is shipped
The table below summarizes the major shipped areas. All items marked ✅ are available on
main today.
| Area | Status |
|---|---|
Interactive TUI (ratatui, transcript, status bar, slash menu, @file picker) | ✅ |
Headless --print / stream-json I/O protocol | ✅ |
Persistent named sessions (--session, --resume, --continue) | ✅ |
Permissions: rule grammar, six modes, caliban perms CLI, audit log | ✅ |
| Built-in tools (Read, Write, Edit, MultiEdit, Glob, Grep, Bash, BashBg, WebFetch, WebSearch, NotebookEdit, TodoWrite, AgentTool, Memory, Plan) | ✅ |
| MCP client (stdio + HTTP, OAuth, elicitation, per-server permissions) | ✅ |
Sub-agents (in-process, background fleet via caliband, worktree isolation) | ✅ |
Memory tiers: CLAUDE.md ancestry, @-imports, auto-memory | ✅ |
| Settings layering (Managed > User > Project > Local, deep-merge, live reload) | ✅ |
| Model router v2 (declarative routes, fallback chains, circuit breakers, capability filters) | ✅ |
| Providers: Anthropic, OpenAI, Google Gemini, Ollama, Bedrock, Vertex | ✅ |
Checkpoints + /rewind | ✅ |
| Plugins, hooks, skills | ✅ |
| OS sandbox (Seatbelt on macOS, bubblewrap on Linux) | ✅ |
| OpenTelemetry + per-request cost tracking | ✅ |
What is partial or backlog
Some rows in the parity matrix are 🟡 (partial / experimental):
| Area | State |
|---|---|
| Slash-menu typeahead | 🟡 partial |
| Multi-line input (Shift+Enter native) | 🟡 partial |
| Vim editing mode in TUI | 🔴 not yet |
Cost surfacing in TUI (/cost display) | 🟡 backlog |
| GitHub Actions workflow / devcontainer feature | 🔴 planned |
| IDE extensions, GitHub App, remote control, mobile (theme N) | 🔴 parked until CLI parity |
IDE extensions, the GitHub App, claude.ai/code, iOS, Slack integration, Remote Control, Channels, Routines, Deep links, and Teleport are all tracked in the parity matrix under theme N. They are explicitly parked until the terminal/CLI feature set reaches full parity with Claude Code. Do not rely on any of these surfaces being available in the near term.
For the full up-to-date breakdown, see Parity vs Claude Code. If you hit something unexpected, see Troubleshooting.
License
Caliban is licensed under AGPL-3.0-only. See Philosophy for the rationale.
Installation & Building
Caliban is distributed as source. You build it with Cargo and install the resulting binary yourself. There are no pre-built releases yet.
Requirements
| Requirement | Details |
|---|---|
| Rust toolchain | 1.95.0, pinned in rust-toolchain.toml |
| rustup | Installs the pinned toolchain automatically on first cargo invocation |
| Git | To clone the repository |
rustup detects rust-toolchain.toml and downloads the exact channel automatically — no manual rustup install step required.
Clone
git clone https://github.com/caliban-ai/caliban.git
cd caliban
Build
Release binary
cargo build --release --bin caliban
The binary lands at target/release/caliban. Build time on a modern machine is a few minutes on a cold cache.
Development build
cargo build --workspace # all crates, debug symbols
cargo test --workspace # full test suite
Put the binary on your PATH
# Option A — copy to a directory already on your PATH
cp target/release/caliban ~/.local/bin/caliban
# Option B — add target/release to PATH (in your shell profile)
export PATH="$PWD/target/release:$PATH"
Smoke test
caliban --version
You should see a version string. If you get a "command not found" error, confirm target/release/ is on your PATH.
Optional: cloud transport feature flags
By default, caliban connects to providers over their public HTTPS APIs. Cloud-managed transports (AWS Bedrock, Google Vertex AI, Azure OpenAI) require optional Cargo feature flags. The exact flag names per crate are:
| Transport | Feature flag |
|---|---|
| Anthropic via AWS Bedrock | caliban-provider-anthropic/bedrock |
| Anthropic via Google Vertex AI | caliban-provider-anthropic/vertex |
| OpenAI via Azure | caliban-provider-openai/azure |
| Gemini via Google Vertex AI | caliban-provider-google/vertex |
To build a binary with multiple cloud transports enabled at once:
cargo build --release --bin caliban \
--features caliban-provider-anthropic/bedrock,caliban-provider-anthropic/vertex,\
caliban-provider-openai/azure,caliban-provider-google/vertex
Cloud transport features are not built in default CI runs. They are exercised by a weekly cron job and by manual dispatch of the ci-cloud workflow.
Helper scripts
The scripts/ directory contains these helpers:
| Script | Purpose |
|---|---|
scripts/check.sh | Mirrors the full PR CI suite locally: cargo fmt --check, cargo clippy, cargo build, cargo test. Accepts --cloud to additionally run the cloud-features build, and --no-test to skip the test step. |
scripts/coverage.sh | Measures workspace line coverage with cargo-llvm-cov and fails below the COVERAGE_MIN floor — the same gate CI enforces. Accepts --html/--open to render an HTML report and --no-fail to report without gating. Writes lcov.info + coverage.json under target/llvm-cov/. |
scripts/coverage-report.py | Renders target/llvm-cov/coverage.json into the Markdown coverage report CI posts as a sticky PR comment (overall stats, per-crate breakdown, notable gaps). Run after coverage.sh to preview it locally. |
Run scripts/check.sh --help or scripts/coverage.sh --help for the full usage summary.
On headless Linux hosts, the default binary features include clipboard (the arboard crate). If your CI image lacks the X11/Wayland clipboard libraries, build with --no-default-features to avoid the link-time dependency.
Your First Session
Get caliban answering questions in under five minutes.
Set an API key
Caliban needs credentials for at least one provider before it can call a model. The quickest path is an environment variable. For Anthropic (the default provider):
export ANTHROPIC_API_KEY=sk-ant-...
For other providers, see Configuring Providers & API Keys.
Run a one-shot prompt
The -p / --print flag runs caliban non-interactively: it sends your prompt, streams the response to stdout, then exits.
caliban -p "What is the capital of France?"
That's it. The assistant's reply prints to stdout.
When no --provider or --model flag is given, caliban defaults to Anthropic with model claude-sonnet-4-6. You can override either flag on the command line:
caliban --provider openai --model gpt-5.5 -p "Hello"
Work in a directory
Caliban uses the current working directory as the workspace root for file and shell tools. Just run it from your project:
cd ~/dev/my-project
caliban -p "Summarise README.md"
Enter interactive mode
Drop the -p flag (and any prompt) to enter the interactive TUI instead:
caliban
Caliban detects that stdin is a TTY and launches the ratatui interface. Type your message and press Enter. To quit, press Ctrl-C or Ctrl-D at an empty prompt.
For a tour of the TUI, see The Interactive TUI.
Named sessions
Every conversation can be saved to a named session and resumed later:
# First run — creates a session called "research"
caliban --session research "Read README.md and summarise it"
# Later — resume the same conversation
caliban --resume research
Sessions are stored on disk under the platform's data directory (for example ~/.local/share/caliban/sessions/ on Linux). See Sessions & Persistence for details.
The Interactive TUI
Invoking caliban with no prompt on a TTY launches the ratatui-based terminal interface. This is the primary mode for open-ended, conversational work.
Launching
caliban
Caliban detects that stdin is a TTY and enters the TUI. If you prefer to start from a specific session, pass --resume <name> or --continue (resumes the most recently updated session).
Basic flow
The screen is divided into three areas:
┌──────────────────────────────────────────────┐
│ assistant: Ready. What would you like to do? │
│ │
│ 🔧 Read({"path":"src/main.rs"}) │
│ → Read src/main.rs, lines 1-42 of 42 │
│ │
│ assistant: The entry point is… │
├──────────────────────────────────────────────┤
│ > █ │
├──────────────────────────────────────────────┤
│ ~/dev/my-project · anthropic claude-sonnet-4-6 · session: work │
└──────────────────────────────────────────────┘
| Area | Purpose |
|---|---|
| Transcript pane (top) | Conversation history, tool calls, and tool results |
| Input bar (middle) | Type your message here |
| Status line (bottom) | Working directory, active provider/model, session name |
Type your message and press Enter to send. For multi-line composition, use Shift+Enter on terminals that support the kitty keyboard protocol (kitty, iTerm2, Ghostty, WezTerm, foot) or Alt+Enter as a portable fallback.
Press Ctrl-C during a turn to cancel it. Press Ctrl-C or Ctrl-D at an empty prompt to exit.
Tool calls and the permission modal
When the model wants to invoke a tool (read a file, run a shell command, etc.) caliban checks its permission rules before executing. Depending on the matching rule, the call is:
- allowed automatically — executes silently; a status line appears in the transcript.
- denied automatically — the model is told the call was refused.
- asked — a modal dialog appears:
┌─ Permission required ────────────────────────────────┐
│ Bash: git commit -am "fix typo" │
│ │
│ [y] Allow once [Y] Always allow │
│ [n] Deny once [N] Always deny │
└──────────────────────────────────────────────────────┘
Pressing y or n handles the call once. Pressing Y or N opens a sub-prompt that lets you write a permanent allow or deny rule to a config scope, so you are not asked again for the same pattern.
Cycling permission modes
Shift+Tab cycles the session-wide permission mode through the available values. The current mode is shown as a chip in the status line (the default mode hides the chip). Modes in order:
| Mode | What happens to Ask-class calls |
|---|---|
default | The modal appears |
acceptEdits | Write/Edit/MultiEdit/NotebookEdit are auto-allowed; Bash still asks |
plan | All tool execution is paused; the model can only plan |
auto | An auto-classifier decides; uncertain calls fall back to Ask |
dontAsk | All Ask-class calls are allowed without prompting |
bypassPermissions (rules ignored entirely) is only reachable when the session was started with --allow-dangerously-skip-permissions.
For a full explanation of each mode, see Permission Modes.
The slash menu
Typing / at the input bar opens a fuzzy-search menu of slash commands:
> /
/clear Clear the transcript
/compact Summarise and compress context
/model Switch the active model
/rewind Restore a checkpoint
…
Continue typing to filter the list; press Enter to run the selected command. See Slash Commands for the full index.
Type @ followed by a path prefix to open a live file picker. Selecting a file inlines its contents into the outgoing message — the model sees the file without a separate Read tool round-trip.
For a deeper look at transcript navigation, keyboard shortcuts, and the @-attachment picker, see The TUI in Depth.
Headless Basics
Headless mode runs caliban non-interactively: prompt in, output to stdout, exit. It is the right entry point for scripts, CI pipelines, and any context where there is no TTY.
The -p / --print flag
Pass -p (or the long form --print) with your prompt to run headlessly:
caliban -p "List the files in this directory"
Without -p, caliban checks whether stdin is a TTY. If stdin is not a TTY (i.e. you are in a pipe) or stdout is piped, caliban enters headless mode automatically. Pass --no-auto-print to suppress the automatic fallback if you need to control this explicitly.
Output formats
The --output-format flag selects what caliban writes to stdout:
| Format | What you get |
|---|---|
text (default) | The assistant's final reply as plain text |
json | A single JSON object — the final result frame — suitable for jq |
stream-json | NDJSON: one event frame per line as the run progresses |
# Plain text
caliban -p "Explain tokio" --output-format text
# Single JSON result
caliban -p "Explain tokio" --output-format json | jq '.result'
# NDJSON stream (tool calls, partial messages, final result)
caliban -p "Explain tokio" --output-format stream-json
The stream-json format is the richest: it includes a system/init frame first (active model, tools, MCP servers, settings sources), per-call tool_use and tool_result frames during the run, and a final type: result frame with token counts and cost. For full details see The stream-json Protocol.
Reading the prompt from stdin
Pass - as the prompt to read from stdin instead of the command line:
echo "What does this error mean?" | caliban -p -
cat error.log | caliban -p -
This is useful when the prompt is too long for a shell argument or is generated by another command.
Exit codes
Caliban follows sysexits.h conventions plus two additional signals:
| Code | Meaning |
|---|---|
0 | Success |
1 | Generic runtime error |
2 | Tool or assistant error |
64 | Bad flags (EX_USAGE) or malformed stream-json input |
66 | Missing input (EX_NOINPUT) — e.g. --resume names a non-existent session |
75 | --max-turns exceeded (EX_TEMPFAIL) |
78 | Configuration error — stdin over 10 MB, settings parse failure |
124 | Cancelled — SIGTERM or Ctrl-C from the agent loop |
130 | SIGINT reached the harness (second Ctrl-C) |
137 | --max-budget-usd exceeded |
CI scripts can distinguish budget exhaustion (137) from a real failure (1/2) without parsing stdout.
When to use headless
Use -p when you know the task up front and want a single answer: one-shot summaries, code review scripts, CI checks, shell pipelines. Use the interactive TUI when you want a back-and-forth conversation, need to inspect tool calls as they run, or want to adjust the permission mode mid-session.
Permissions in headless mode
There is no modal in headless mode. Any rule that would normally show the Ask dialog instead becomes a hard deny. Read-only tools (Read, Glob, Grep) are allowed by default, but write and shell tools are not. To grant write access, pick one:
# Auto-allow file edits
caliban -p "Fix the typo in README.md" --permission-mode acceptEdits
# Narrow allow rule (repeatable)
caliban -p "Run tests" --allow 'Bash:cargo test*'
# Allow everything that would normally Ask (use sparingly)
caliban -p "..." --auto-allow
For depth on permission modes and rule syntax, see Print Mode.
Sessions & Persistence
Every conversation caliban has with a model is a session: a named, timestamped record of messages, token usage, and active todos. Sessions persist automatically so you can stop at any point and pick up exactly where you left off.
Starting a named session
caliban --session my-project
If my-project already exists on disk, caliban resumes it. If not, a new empty session is created. Session names must match [a-zA-Z0-9_-]+ and be between 1 and 64 characters.
Resuming a previous session
Three flags handle resume:
| Flag | Meaning |
|---|---|
--session NAME | Load or create the session named NAME. |
-c / --continue | Resume the most recently updated session. |
-r NAME / --resume NAME | Resume a named session (alias for --session with load semantics). |
-c is the fastest way back into your last conversation:
caliban -c
-r accepts the same name grammar as --session:
caliban -r my-project
Resume semantics
When caliban opens an existing session it restores the full message history and accumulated token usage. The model and provider recorded in the session file are used unless overridden by --model or --provider on the command line. Plan-mode state and the todo list are also restored.
Two caliban processes writing to the same session file concurrently will race. Caliban does not lock session files — run one interactive instance per session name at a time.
Suppressing persistence
To run a session entirely in memory without writing to disk, pass --no-save:
caliban --no-save
The session still functions normally for the duration of the run; nothing is written when it ends.
Overriding the sessions directory
By default, sessions are stored under your platform's data directory (see Files & Directories for the per-OS table). You can point caliban at a different directory for the duration of a run:
caliban --sessions-dir /path/to/sessions --session my-project
CALIBAN_SESSIONS_DIR is not a recognized env var for this flag — use --sessions-dir directly.
Session file format
Each session is a pretty-printed JSON file at <sessions-dir>/<NAME>.json. Fields include name, provider, model, messages, total_usage, created_at, updated_at, todos, and plan_mode. Files are written atomically (via a debounced background writer with a 250 ms window) to prevent corruption from crashes mid-save.
You can inspect, diff, or even git-track session files directly — the format is intentionally human-readable.
Listing sessions from the TUI
Inside the TUI, /resume lists all known sessions sorted by last-modified date. An optional substring filter narrows the list:
/resume # show all sessions
/resume my-proj # show sessions whose name contains "my-proj"
Each row shows the session name, turn count, total token usage, and last-modified time. To open a listed session, exit and re-launch with caliban --session <NAME>.
The TUI in Depth
Caliban's interactive mode is a full-screen terminal UI built on ratatui + crossterm. This chapter covers everything that goes beyond the basics introduced in The Interactive TUI.
Layout
The screen is divided into three regions from top to bottom:
┌─────────────────────────────────────────────────────┐
│ │
│ Transcript / output region (flex-grow) │
│ │
├─────────────────────────────────────────────────────┤
│ Input area (2 rows) │
├─────────────────────────────────────────────────────┤
│ Status bar (1 row) │
└─────────────────────────────────────────────────────┘
The input area sits between the transcript and the status bar, placing the prompt visually close to the context information below it.
Status line
The status bar shows cwd · provider model · session (turns) · running… during a live turn. When caliban is idle the spinner disappears and the elapsed-turn time is shown instead.
A custom prefix segment can be prepended by configuring a shell script in settings (see Settings Reference for the statusLine key). The script runs off-thread after each turn completes; its output is cached so it never blocks rendering. Use /statusline to inspect the active configuration.
Keybindings
| Key | Action |
|---|---|
Enter | Submit prompt |
\ + Enter | Insert a literal newline (multi-line input) |
PageUp / PageDown | Scroll transcript |
Ctrl+R | Reverse history search (session scope) |
Ctrl+S | Cycle history scope → project → all projects |
Ctrl+G | Open prompt in $VISUAL / $EDITOR / vi |
Ctrl+O | Open transcript viewer overlay |
Ctrl+B | Launch or follow a background bash process |
Shift+Tab | Cycle permission mode chip |
Esc | Close overlay / cancel input |
Esc Esc | Open checkpoint rewind overlay (on empty input) |
Overlays
Overlays are modal popups rendered centered (approximately 80% × 80%) over the main view. Press Esc or q to close any overlay. The active input bar is suppressed while an overlay is open.
Available overlays and how to reach them:
| Overlay | How to open |
|---|---|
| Help | /help |
| Configuration | /config |
| MCP server status | /mcp |
| Skills | /skills |
| Permissions editor | /permissions |
| Transcript viewer | Ctrl+O |
| Checkpoint rewind | /rewind or Esc Esc (on empty input) |
| System prompt | /system |
Editor modes
Caliban's input bar uses emacs-style key bindings by default (Ctrl+A / Ctrl+E for line start/end, Ctrl+K to kill to end-of-line, etc.).
Vim editing mode is listed as a gap in the parity matrix (status: 🔴 planned). The InputMode enum is designed to accommodate a vim layer, but it has not shipped. Emacs bindings are the only editor mode in the current release.
External editor handoff
Ctrl+G writes your current input buffer to a temp file, suspends the TUI (leaving the alternate screen), execs $VISUAL / $EDITOR / vi with the file as the argument, then reads the result back and re-enters the TUI. Multi-word editor values like EDITOR='code --wait' work because the value is split on whitespace without shell parsing.
Transcript viewer
Ctrl+O opens the transcript viewer overlay. It renders every ContentBlock in the conversation history — text, tool calls, tool results, thinking blocks, and images — as the model sees them.
| Key | Action |
|---|---|
[ | Dump the current viewport to scrollback (leave + re-enter alt-screen) |
v | Open the full transcript in $VISUAL |
q / Esc | Close the viewer |
? | Show key reference |
Following background bash (Ctrl+B)
Background bash lets caliban run a shell command in the background while you continue interacting with the agent. Press Ctrl+B inside the TUI to open or follow the background bash output panel. The agent can launch background bash tasks via Bash{background:true}; the TUI surfaces their output through the same panel.
Reverse history search
Ctrl+R opens inline reverse search over the current session's prompt history, showing matches as you type. Ctrl+S cycles the scope outward:
Ctrl+R → session scope
Ctrl+S → project scope → all-projects scope
Wider scopes are loaded lazily in a background task (budget: 2 s). History is persisted per project.
All TUI-relevant settings — the status line script, output style, and context-window thresholds — live in the settings hierarchy. See Settings Reference and Output Styles for details.
Prompts, Attachments & Images
This chapter covers how to compose prompts, reference files, and send images to the model — whether you are working interactively in the TUI or driving caliban from the command line.
Writing prompts
In the TUI, type your prompt in the input area and press Enter to submit. For a multi-line prompt, press \ followed by Enter to insert a newline, then Enter alone on a blank line to submit.
For longer drafts, press Ctrl+G to open the current input buffer in $VISUAL / $EDITOR / vi. Caliban reads the saved file back when the editor exits.
In headless mode, pass the prompt via a positional argument, --prompt TEXT, or pipe from stdin using -:
caliban "Explain the diff"
caliban --prompt "Explain the diff"
git diff | caliban -p -
@path file references
Type @ in the TUI input bar to open the file suggestion menu (gitignore-aware). Continue typing to narrow by path. The selected file is read and attached to your prompt as a text block at submit time.
You can also type @path/to/file directly without the menu. Any @-reference that resolves to an image-like extension (.png, .jpg, .jpeg, .gif, .webp) is handled by the image pipeline rather than as text — see Images below.
Leading ! at the start of the input bar runs the rest of the line as a shell command via the Bash tool (subject to permission rules). The result is not added to the conversation history.
Attachment size limits
Two flags control how large @-attachments can be:
| Flag | Env var | Default | Meaning |
|---|---|---|---|
--max-attach-bytes | CALIBAN_MAX_ATTACH_BYTES | 262144 (256 KB) | Maximum size of a single @-attachment |
--attach-budget-bytes | CALIBAN_ATTACH_BUDGET_BYTES | 1048576 (1 MB) | Aggregate cap across all attachments in one message |
If a single file exceeds --max-attach-bytes or the total across all files exceeds --attach-budget-bytes, caliban rejects the attachment with a clear error before sending anything to the model.
# Raise limits for a large codebase session
caliban \
--max-attach-bytes 524288 \
--attach-budget-bytes 4194304 \
--session big-project
Images
Caliban supports image input via three entry points:
@path— reference an image file by path in the TUI or via--prompt "@screenshot.png explain this"in headless mode.- Clipboard paste — paste an image from the clipboard directly into the TUI input bar (platform clipboard integration required; built with the
clipboardfeature). - Drag-and-drop — drag an image file into a supporting terminal emulator; caliban parses the DnD escape sequence and ingests the file.
Supported MIME types: image/png, image/jpeg, image/gif, image/webp.
Ingest pipeline
Before sending an image to a model, caliban runs it through an ingest pipeline:
- MIME sniff — infers type from magic bytes; rejects anything outside the allowlist.
- Decode + dimension check — decodes the image to verify it is not corrupt.
- Downscale — if the file exceeds 5 MiB (pre-base64) or the longest edge exceeds 1568 px, caliban downscales using Lanczos3 resampling. A
[downscaled]badge appears in the TUI. The 1568 px target matches Anthropic's recommended longest edge for cost-efficient vision inputs. - SHA-256 fingerprint — deduplicated images are not re-sent within a session.
The pipeline is configurable via [images] in caliban.toml:
[images]
max_bytes = 5242880 # 5 MiB pre-base64 cap
downscale_target = 1568 # longest-edge px target
Capability routing
By default, caliban will refuse to send an image to a model that does not have vision capability, surfacing a clear RouterError::NoCandidate rather than silently dropping the image. Set CALIBAN_STRICT_ROUTING=false to opt into degraded behavior where image content is replaced with a text placeholder and the request proceeds.
Session storage
Images are stored as blobs under <sessions-dir>/<session>/blobs/<sha256>.bin. Session JSON files carry only a BlobRef (the SHA-256), keeping transcripts small and git-diffable.
Graphics protocol detection
When the TUI renders an image inline, it detects the terminal's graphics protocol once at session start using the following cascade:
CALIBAN_GRAPHICSenv var — values:kitty,iterm,sixel,none.$TERM_PROGRAM—iTerm.appandWezTerm→ iTerm2 protocol.$TERM— containskitty→ Kitty protocol; containssixel→ DEC sixel.- Fallback — text placeholder
[image: WxH MIME filename].
Override detection explicitly when caliban picks the wrong protocol:
CALIBAN_GRAPHICS=kitty caliban --session vision-work
If you see a RouterError::NoCandidate error when pasting images, confirm that your active provider and model support vision. Check the active route with caliban router debug or /config in the TUI.
Slash Commands
Slash commands are operator-level shortcuts you type directly in the TUI input bar. They are not model-tool calls and are not gated by the permission rule grammar — they run as your direct action.
How the slash system works
Type / in the input bar to open the suggestion menu. A fuzzy typeahead list appears showing all registered commands grouped by category.
The slash-menu fuzzy typeahead is marked 🟡 (partial) in the parity matrix. Basic prefix matching works; full fuzzy ranking and category grouping are in progress.
Continue typing to narrow the list, then press Enter (or Tab) to select a command. Some commands run immediately (immediate: true) and return to the input bar; others open an overlay or emit output to the transcript.
Hooks fire on every slash submission: UserPromptSubmit carries is_slash: true, command, and args, so hooks can audit or veto any slash command.
Plugin-supplied commands
Plugins can register additional slash commands through the same SlashCommandRegistry. Built-in commands take priority; a plugin command with a conflicting name is dropped with a warning at registration time. See Plugins for details.
Common commands
The table below lists the most frequently used built-in commands. The full list — including commands added by plugins — is enumerated at runtime by /help inside the TUI.
| Command | Args | What it does |
|---|---|---|
/help | — | Open the help overlay listing all visible commands |
/clear | — | Clear transcript and conversation history; keep todos and system prompt |
/quit | — | Exit caliban (/exit is an alias) |
/resume | [query] | List persisted sessions (optional name substring filter) |
/init | [--force] | Generate CLAUDE.draft.md from AGENTS.md / .cursorrules / git status |
/model | [id] | Show or switch the active model (same-provider swap in v1) |
/effort | <level> | Set reasoning effort: low, medium, high, max, or auto |
/usage | — | Show token usage and cumulative cost for this session |
/cost | — | Show cumulative USD spend with per-model breakdown |
/context | — | Show context-window utilization + top-N largest blocks |
/compact | — | Trigger the configured compactor to summarize history |
/config | — | Open the configuration overlay (merged settings + scope chain) |
/mcp | — | Open the MCP server status overlay |
/hooks | — | List configured hooks per event |
/plugins | — | List installed plugins with enable/disable status |
/permissions | — | Open the permissions overlay; cycle mode with Tab, delete rule with d |
/rewind | — | Open the checkpoint picker (also: Esc Esc on empty input) |
/recap | — | Summarize the conversation without mutating history |
/btw | <question> | One-shot ephemeral side query to a fast model; result inlined |
/export | [path] [--format json] | Export session transcript to markdown (or JSON) |
/doctor | [--deep] | Run health checks: settings, MCP, skills, hooks, provider auth |
/status | — | Show provider and auth status |
/statusline | — | Inspect the active custom status-line configuration |
/loop | [--n=N] [--interval=S] | Plan repeated turns (execution bounded by --max-turns) |
The complete, up-to-date slash command index — including plugin-supplied commands and hidden aliases — lives in Slash Command Index. The index is generated from the live registry so it always reflects what is actually registered in your build.
Adding your own slash commands
Custom slash commands are defined as skills or plugins. See Custom Slash Commands for the authoring guide.
Supported Providers
Caliban is provider-agnostic: you choose which AI provider and model to use at runtime, and the same agent loop, tool engine, and permission system work regardless of which backend answers the requests.
Provider table
| Provider | --provider value | Transport / access | Notes |
|---|---|---|---|
| Anthropic | anthropic | Direct HTTPS (api.anthropic.com) | Default provider |
| Anthropic via Bedrock | (router only) | AWS Bedrock (bedrock-runtime.*) | Requires caliban-provider-bedrock; configured via caliban.toml |
| Anthropic via Vertex | (router only) | Google Vertex AI | Requires caliban-provider-vertex; configured via caliban.toml |
| OpenAI | openai | Direct HTTPS (api.openai.com/v1) | |
| OpenAI via Azure | (router only) | Azure OpenAI Service | azure feature flag on caliban-provider-openai; configured via caliban.toml |
google | Google AI Studio (generativelanguage.googleapis.com) | Gemini models | |
| Google via Vertex | (router only) | Google Vertex AI | vertex feature flag; configured via caliban.toml |
| Ollama | ollama | Local HTTP (http://localhost:11434) | No API key required |
Bedrock, Vertex, and Azure transports are enabled by Cargo feature flags at build time. Binary distributions built by the project team include all features; self-compiled builds must enable the relevant feature (e.g. --features bedrock). These transports can only be selected through the model router — they are not available via the --provider CLI flag.
Capability matrix
| Provider | Tool use | Vision | Thinking | Prompt caching |
|---|---|---|---|---|
| Anthropic | Parallel | Yes | Yes | Explicit (up to 4 breakpoints) |
| Bedrock | Parallel | Yes | Yes | Explicit (mirrors Anthropic) |
| Vertex (Anthropic) | Parallel | Yes | Yes | Explicit (mirrors Anthropic) |
| OpenAI | Parallel | Yes | Yes (o-series) | Automatic |
| Azure OpenAI | Parallel | Yes | Yes (o-series) | Automatic |
| Google AI Studio | Parallel | Yes | No | None |
| Google Vertex | Parallel | Yes | No | None |
| Ollama | Basic | Model-dependent | Model-dependent | None |
Ollama runs models on your own machine. No API key, no network traffic, no per-token cost. Ideal for fast-classifier routes, offline use, or privacy-sensitive workloads. Capability varies by the specific model you pull.
The model router lets you combine providers: for example, route main-loop turns through Anthropic while using a local Ollama model for fast classification. Each route gets its own provider, model, and resilience policy.
Configuring Providers & API Keys
Caliban needs to know which provider to use and how to authenticate with it. Provider selection happens on the command line; authentication is supplied via environment variables or a dynamic key helper.
Selecting a provider
Pass --provider to select the backend for a session:
caliban --provider anthropic # default
caliban --provider openai
caliban --provider google
caliban --provider ollama # no API key needed
When --provider is omitted, caliban resolves the provider from settings.model (see Model Selection), falling back to anthropic.
API key environment variables
Each provider reads its key from a well-known environment variable:
| Provider | Required env var | Optional env vars |
|---|---|---|
| Anthropic | ANTHROPIC_API_KEY | ANTHROPIC_BASE_URL, ANTHROPIC_VERSION |
| OpenAI | OPENAI_API_KEY | OPENAI_BASE_URL, OPENAI_ORG_ID, OPENAI_PROJECT |
GEMINI_API_KEY | GOOGLE_GEMINI_API_KEY (alias), GEMINI_BASE_URL, GEMINI_API_VERSION | |
| Ollama | (none) | OLLAMA_BASE_URL (default: http://localhost:11434) |
| Azure OpenAI | AZURE_OPENAI_API_KEY, AZURE_OPENAI_RESOURCE | AZURE_OPENAI_API_VERSION (default: 2024-10-21) |
Set the variable in your shell profile or pass it inline:
export ANTHROPIC_API_KEY="sk-ant-..."
caliban "summarize this file"
Dynamic key helper (api_key_helper)
For secrets stored in a keychain, vault, or SSO-backed credential store, set api_key_helper in your settings file instead of exposing keys in the environment. The helper is a process caliban spawns to retrieve the current key on demand.
Forms
Bare string — a single executable path or command string, used for all providers:
api_key_helper = "/usr/local/bin/get-caliban-key"
Object — one helper with explicit options:
[api_key_helper]
command = "/usr/local/bin/get-caliban-key"
provider = "anthropic" # omit for wildcard ("*")
refreshIntervalMs = 300000 # 5 minutes (default)
slowHelperWarningMs = 10000 # warn if script takes > 10 s (default)
Array — different helpers per provider, with a wildcard fallback:
[[api_key_helper]]
provider = "anthropic"
command = "/usr/local/bin/anthropic-key"
[[api_key_helper]]
provider = "*"
command = "/usr/local/bin/generic-key"
The helper receives two environment variables:
CALIBAN_PROVIDER— the provider id (e.g.anthropic)CALIBAN_API_KEY_HELPER_TTL_MS— the configured refresh interval in milliseconds
It must print the API key to stdout (trailing newline is stripped) and exit 0. Any non-zero exit is treated as an error.
Caching and refresh
Caliban caches the returned key in memory for refreshIntervalMs (default 5 minutes). On a 401 or 403 from the provider, the cache entry is invalidated and the helper is re-invoked immediately for a fresh key. Override the TTL globally with CALIBAN_API_KEY_HELPER_TTL_MS.
A one-liner shell wrapper around security find-generic-password (macOS) or secret-tool lookup (Linux/GNOME) makes api_key_helper work with the OS keychain without storing the key in any file.
Bedrock and Vertex configuration
AWS Bedrock and Google Vertex are configured through the model router using [provider.bedrock] and [provider.vertex] blocks in caliban.toml. Authentication follows each platform's standard credential chain:
- Bedrock — AWS credential chain (
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY, instance profiles,~/.aws/credentials). A background task refreshes credentials on a configurable interval (default 5 minutes). - Vertex (Anthropic) — Google Application Default Credentials (
GOOGLE_APPLICATION_CREDENTIALS,gcloud auth application-default login). - Vertex (Google) — same GCP ADC path as the Anthropic Vertex transport.
See The Model Router for the full caliban.toml syntax, including [provider.X] blocks that let you override the env var name or base URL per provider.
For a full listing of every setting key, see Settings Reference.
Model Selection
Caliban lets you choose the exact model at the command line, in settings, or via the model router. When multiple sources specify a model, a clear precedence chain resolves the winner.
Selecting a model at the command line
Use --model to name the model you want:
caliban --model claude-opus-4-7 "write a haiku"
caliban --provider openai --model gpt-5.5 "explain monads"
caliban --provider google --model gemini-2.0-flash "summarize this"
caliban --provider ollama --model qwen3.5:9b "local inference"
Per-provider defaults
When --model is omitted and no model is set in settings, caliban uses a built-in default for the chosen provider:
| Provider | Default model |
|---|---|
anthropic | claude-sonnet-4-6 |
openai | gpt-5.5 |
google | gemini-2.0-flash |
ollama | llama3.1 |
Setting a model in settings
Set model in your project or user settings file to avoid repeating --model on every invocation. Two forms are accepted:
Bare string — the provider is inferred from the model name resolution or --provider:
model = "claude-sonnet-4-6"
Qualified object — explicitly names both the provider and the model:
[model]
provider = "anthropic"
name = "claude-sonnet-4-6"
The qualified form is the safest option in shared project configs because it makes the intended provider unambiguous.
You can also set a fallback_model that caliban uses when the primary model errors:
[model]
provider = "anthropic"
name = "claude-opus-4-7"
[fallback_model]
provider = "anthropic"
name = "claude-sonnet-4-6"
Fallback model (--fallback-model)
Pass --fallback-model on the command line to override the settings fallback for a single run:
caliban --model claude-opus-4-7 --fallback-model claude-sonnet-4-6 "long task"
The fallback is wired through caliban-model-router (ADR 0038) and is also surfaced in the headless system/init frame.
Per-turn limits
Control token usage and sampling with these flags:
| Flag | Default | Description |
|---|---|---|
--max-tokens N | 8192 | Per-turn output token limit. Must be ≥ 1. |
--temperature F | (provider default) | Sampling temperature in [0.0, 2.0]. Values outside this range are rejected at startup. |
caliban --max-tokens 8192 --temperature 0.2 "write a long essay"
Per-purpose model overrides (model_overrides)
For finer-grained control without a full router config, set model_overrides in settings to pin specific request purposes to a particular model string:
[model_overrides]
fast-classifier = "claude-haiku-4-5"
summarization = "claude-haiku-4-5"
The keys must match the purpose slugs understood by the router (main_loop, summarization, fast_classifier, sub_agent, embedding). This setting does not support cross-provider routing; use the model router for that.
Precedence
When multiple sources specify a model, this chain resolves the winner (highest priority first):
flowchart LR
A["CLI<br/>--model / --provider"] --> B["settings.model<br/>(project > user > managed)"]
B --> C["Provider default<br/>(built-in table)"]
- CLI flags (
--model,--provider) — always win. settings.model— merged across the settings scope chain (project > user > managed).- Provider built-in default — the per-provider fallback in the table above.
For the most flexible per-purpose routing, see The Model Router.
The Model Router
The model router is an optional layer. If you only need a single provider and model, the --provider and --model flags are all you need. This chapter is relevant when you want per-purpose model dispatch, fallback chains, hedging, or circuit breakers.
The model router is a purpose-keyed dispatcher that sits between the agent loop and your provider adapters. It lets you assign different models — from the same or different providers — to different kinds of requests, and adds resilience features (fallback, hedging, circuit breakers) on top.
Why a router?
The agent makes provider calls for several distinct purposes: the main conversational loop, summarization for compaction, fast classification for permission decisions, sub-agent loops, and more. The router lets you express policies like:
- Use Claude Opus for main-loop turns, Claude Haiku for summarization.
- Route fast classification to a local Ollama model (zero API cost, low latency).
- Fall back from Anthropic to OpenAI if Anthropic returns a rate-limit error.
Request purposes
Each internal request carries a purpose that the router uses for dispatch:
| Purpose | Slug | Description |
|---|---|---|
| Main loop | main_loop | Primary conversational turns |
| Summarization | summarization | Context compaction summaries |
| Fast classifier | fast_classifier | Auto-mode permission decisions |
| Sub-agent | sub_agent | Spawned sub-agent loops |
| Embedding | embedding | Embedding / memory retrieval |
| Other | other | Requests that don't fit a category |
Enabling the router
Drop a caliban.toml file in your project root. Caliban discovers it by walking up from the current directory to the nearest git root or $HOME, then falls back to ~/.config/caliban/caliban.toml. You can also point directly to a file:
caliban --config /path/to/caliban.toml "my prompt"
# or via env var:
CALIBAN_ROUTER_CONFIG=/path/to/caliban.toml caliban "my prompt"
Discovery order (highest priority first): --config flag → CALIBAN_ROUTER_CONFIG → walk-up from current directory → ~/.config/caliban/caliban.toml.
Basic configuration
A minimal caliban.toml with two purpose-keyed routes:
[router]
default_purpose = "main_loop"
[[router.route]]
purpose = "main_loop"
provider = "anthropic"
model = "claude-opus-4-7"
[[router.route]]
purpose = "fast_classifier"
provider = "ollama"
model = "llama3.2:3b"
Valid provider values: anthropic, openai, google, ollama.
Provider blocks
Override the API key env var or base URL for a provider in caliban.toml:
[provider.openai]
api_key_env = "OPENAI_API_KEY_STAGING"
base_url = "https://oai-staging.example.com/v1"
[provider.ollama]
base_url = "http://gpu-server.local:11434"
Fallback chains
When a route fails with a retriable error (rate-limit, model unavailable, network timeout, server error), the router tries the next route for the same purpose. Define an explicit ordered fallback list, or let declaration order in the file serve as the implicit chain:
[[router.route]]
id = "main-primary"
purpose = "main_loop"
provider = "anthropic"
model = "claude-opus-4-7"
fallback = ["main-fallback"] # explicit: only try this specific route next
[[router.route]]
id = "main-fallback"
purpose = "main_loop"
provider = "openai"
model = "gpt-5.5"
Set fallback = [] to disable fallback entirely for a route.
Errors that are not retriable (auth failure, content policy, invalid request, cancellation) propagate immediately without trying another route.
Hedging
Hedging races a second route against the primary after a configurable delay. The first to respond wins; the other is cancelled. This is a spend-for-latency trade-off and must be opted in explicitly:
[[router.route]]
purpose = "main_loop"
provider = "anthropic"
model = "claude-sonnet-4-6"
hedge = { hedge_after_ms = 1000, max = 1 }
A global default applies to all routes in the file:
[router.hedge]
hedge_after_ms = 1500
max_hedges = 1
Set hedge = false on a route to disable the global default for that route.
Every hedged request that wins incurs a full charge on the winning route and a partial charge on the losing route for tokens sent before cancellation. Enable hedging only on routes where the latency benefit justifies the extra spend.
Circuit breakers
A circuit breaker tracks failures per route and temporarily stops routing to a route that is consistently failing. Once the cool-off window passes, the breaker enters a half-open state and probes the route before fully reopening.
[router.breaker] # global defaults
failure_threshold = 5 # trip after 5 failures within the window
window_secs = 60
cooldown_secs = 30
half_open_probes = 1
[[router.route]]
purpose = "main_loop"
provider = "anthropic"
model = "claude-sonnet-4-6"
breaker = false # disable the global breaker for this route
Per-route breaker overrides can supply any subset of the fields; the rest inherit the global defaults. Cancellation outcomes do not count as failures.
Capability filters
Routes can declare capability requirements. The router only sends a request to a route if the request's needs satisfy the route's declared capabilities:
[[router.route]]
purpose = "main_loop"
provider = "anthropic"
model = "claude-sonnet-4-6"
requires = { vision = true, tool_use = true }
The router also derives needs automatically from the request content (image blocks → vision need, tool declarations → tool-use need, thinking budget → thinking need), so you do not need to annotate every route manually.
Effort levels
Set a default effort level on a route and optionally map each level to a provider-specific knob string:
[[router.route]]
purpose = "main_loop"
provider = "anthropic"
model = "claude-sonnet-4-6"
effort = "medium"
[router.route.effort_map]
low = "budget=1024"
medium = "budget=8192"
high = "budget=32768"
Valid effort levels: low, medium (default), high. Callers that don't specify an effort level inherit the route's default; the route default falls back to medium.
Diagnosing the router
Use caliban router debug to print the candidate list the router would resolve for a synthetic request, including breaker state and effort knobs:
# Default: main_loop purpose, no special needs
caliban router debug
# Simulate a vision + tool request
caliban router debug --purpose main_loop --has-vision --has-tools
# Show the effort table for a high-effort request
caliban router debug --effort high
# Point at a specific config file
caliban --config ./caliban.toml router debug --purpose summarization
The output shows each route with a + (kept) or - (dropped) marker, the reason it was kept or dropped, and the current circuit-breaker state.
flowchart LR
R["Request\n(purpose + needs)"] --> Res["Resolve candidates\n(purpose filter →\ncapability filter →\nbreaker filter)"]
Res --> D{"Dispatch"}
D -- "success" --> Resp["Response"]
D -- "retriable error" --> F["Next candidate\n(fallback chain)"]
F --> D
D -- "hedge delay" --> H["Hedge race\n(first wins)"]
H --> Resp
Settings Layering
Caliban merges configuration from up to five sources before starting. Knowing the merge order lets you predict which value wins when the same key appears in multiple places.
The five scopes
| Priority | Scope | Description |
|---|---|---|
| 1 (highest) | CLI | --settings <FILE|JSON> overlay injected above local |
| 2 | Local | .caliban/settings.local.toml in the workspace |
| 3 | Project | .caliban/settings.toml in the workspace |
| 4 | User | OS user-config directory (see File Locations) |
| 5 (lowest) | Managed | System-wide directory set by an operator |
Higher priority always wins for scalar values. The CLI scope is a virtual overlay — it has no on-disk file.
flowchart LR
M["Managed\n(lowest)"] --> U["User"]
U --> P["Project"]
P --> L["Local"]
L --> C["CLI --settings\n(highest)"]
C --> EFF["Effective\nSettings"]
Deep-merge semantics
Scalars use highest-wins: the value from the highest-priority scope that defines the key is used; lower scopes are ignored for that key.
Arrays and maps have richer rules:
| Key(s) | Merge behaviour |
|---|---|
permissions.allow, .ask, .deny | Concatenated in priority order (lower scopes first, higher appended); duplicates dropped |
permissions.rules | Concatenated in priority order; source order within each scope is preserved |
hooks.<Event> | Concatenated |
mcp_servers.<name> | Deep-merged per server; a project scope can add an env key to a user-scope server without redefining the whole entry |
env | Deep-merged (highest-priority value wins per key) |
additional_directories, claude_md_excludes | Concatenated |
| Everything else | Highest-wins scalar |
The --settings CLI overlay
--settings injects a virtual scope that sits above Local but below any active managed-block. It accepts either an inline JSON object or a path to a .json or .toml file:
# inline JSON
caliban --settings '{"model": "claude-opus-4-7"}'
# file path
caliban --settings /tmp/ci-overrides.toml
This is the recommended way to supply CI-specific settings without touching scope files.
parent_settings_behavior — managed lockdown
When an operator sets parent_settings_behavior = "block" in the managed scope, the merge order flips: the managed scope moves to the top of the chain and overrides every other scope, including the CLI overlay.
# /Library/Application Support/Caliban/managed-settings.toml
parent_settings_behavior = "block"
model = "claude-haiku-4-7"
With "block" active, users cannot override model from their own settings or from --settings. The value "augment" is the default behaviour (managed sits at the bottom).
When parent_settings_behavior = "block" is set in the managed scope, all user, project, local, and CLI settings for locked keys are ignored. The effective values come exclusively from the managed scope for those keys.
--setting-sources — scope filtering
--setting-sources restricts which on-disk scopes are loaded. It accepts a comma-separated list of scope names: managed, user, project, local. The CLI overlay is always applied regardless of this flag.
# Load only user + project scopes (skip local overrides)
caliban --setting-sources user,project
# Pin to project scope only — useful for reproducible CI runs
caliban --setting-sources project
An unknown scope name is a fatal error (exit 78) rather than a silent no-op.
Live reload
A file watcher monitors each scope's path with a 250 ms debounce. When a file changes, caliban re-loads and re-merges all scopes atomically and fires a ConfigChange hook event. Most keys take effect immediately:
- Live-reloadable:
permissions.*,hooks.*,api_key_helper.*,output_style,editor_mode,view_mode,statusLine,env,memory,additional_directories,claude_md_excludes - Restart-required:
model,fallback_model,mcp_servers.*,auto_compact_threshold,micro_compact_enabled
Restart-required keys log a WARN on change and take effect on the next caliban invocation. The /config TUI overlay shows a "restart required" badge next to changed restart-required keys.
Run caliban config print to see the fully-merged settings with per-key scope annotations, without starting a session.
File Locations
Caliban resolves settings files from four on-disk scopes. This page lists the canonical path for each scope on each supported OS.
Scope paths
Managed scope
Set by a system administrator. Caliban reads but never writes this directory.
| OS | Path |
|---|---|
| macOS | /Library/Application Support/Caliban/managed-settings.toml |
| Linux | /etc/caliban/managed-settings.toml |
| Windows | C:\ProgramData\Caliban\managed-settings.toml |
The JSON equivalent (managed-settings.json) is accepted on read as a legacy path but triggers a WARN on startup.
User scope
Per-user settings that apply across all projects. Caliban uses the standard OS user-configuration directory (the parent of caliban/) resolved via the dirs crate.
| OS | Path |
|---|---|
| macOS | ~/Library/Application Support/caliban/settings.toml |
| Linux | ~/.config/caliban/settings.toml (or $XDG_CONFIG_HOME/caliban/settings.toml) |
| Windows | %APPDATA%\caliban\settings.toml |
Project scope
Committed alongside your code. This file should be checked into version control and shared with your team.
| OS | Path |
|---|---|
| All | <workspace>/.caliban/settings.toml |
Local scope
Machine-local overrides that should not be committed. Add .caliban/settings.local.toml to your .gitignore.
| OS | Path |
|---|---|
| All | <workspace>/.caliban/settings.local.toml |
Per-feature files (legacy)
Caliban still loads standalone per-feature TOML files during the current compatibility window. They are consulted only when the corresponding key is absent from the unified settings file in the same scope directory.
| File | Key governed | Notes |
|---|---|---|
.caliban/permissions.toml | permissions | Can also coexist alongside settings.toml; its permissions block overrides the permissions key in settings.toml for that scope |
.caliban/mcp.toml | mcp_servers | Legacy transport key is transport; canonical key is type |
.caliban/hooks.toml | hooks, disable_all_hooks, allow_managed_hooks_only, allowed_http_hook_urls, http_hook_allowed_env_vars |
Per-feature TOML files are deprecated. Caliban logs a WARN when it falls back to them. After two minor releases the warning becomes an error. Run caliban config migrate to consolidate them into a single settings.toml.
TOML vs JSON
TOML is the canonical write format. JSON is accepted on read as a legacy/import path:
- When both
settings.tomlandsettings.jsonexist in the same scope directory,.tomlwins and caliban logs aWARNabout the ignored.jsonfile. - When only
settings.jsonexists, caliban loads it with aWARNrecommending migration. - Caliban's own write paths (modal,
caliban perms add,/permissionseditor) always emit TOML.
Atomic writes
All caliban-owned writes use an atomic flock + temp-file rename pattern:
- A sibling
.settings.toml.lockfile is exclusively flocked. - Content is written to a uniquely-named
.toml.tmp.<pid>.<tid>file. - The temp file is synced and renamed onto the target.
- The lock is released.
This ensures concurrent writers (e.g. two terminal sessions) never produce a corrupted file.
See Files & Directories for the full list of all caliban-managed paths including sessions, cache, logs, and debug output.
Settings Reference
Every key recognized by settings.toml (and its JSON equivalent) is listed below, grouped by topic. For merge semantics see Settings Layering; for file paths see File Locations.
All fields are optional. Unknown top-level keys are tolerated for forward-compatibility (they are collected and ignored rather than causing a parse error).
Model / Agent
| Key | Type | Default | Description |
|---|---|---|---|
agent | string | — | Agent profile name used as a sub-agent dispatch hint |
model | string or { provider, name } | provider default | Primary model. Bare string (e.g. "claude-sonnet-4-7") or qualified object (e.g. { provider = "anthropic", name = "claude-sonnet-4-7" }). CLI --model / --provider override this |
fallback_model | string or { provider, name } | — | Model used when the primary returns an error. Wired through caliban-model-router. CLI --fallback-model overrides this |
model_overrides | { route → model } | {} | Per-named-route model overrides passed to the router (e.g. { "fast-classifier" = "claude-haiku-4-7" }) |
For provider and model selection details see Model Selection and The Model Router.
Permissions
| Key | Type | Default | Description |
|---|---|---|---|
permissions.allow | string[] | [] | Patterns that auto-allow without prompting. Concatenated across scopes |
permissions.ask | string[] | [] | Patterns that prompt the user. Concatenated across scopes |
permissions.deny | string[] | [] | Patterns that hard-deny. Concatenated across scopes |
permissions.rules | RuleSpec[] | [] | v2 ordered rule array. When non-empty, takes precedence over the three-bucket form above. Source order is preserved; first match wins |
permissions.enforce | bool | false | When true, refuse --no-permissions / bypass mode at startup |
permissions.default_mode | string | "default" | Initial permission mode at session start. Valid values: default, acceptEdits, plan, auto, dontAsk, bypassPermissions |
permissions.audit_log | bool | true | Enable the append-only decision log |
Each entry in permissions.rules supports:
| Field | Type | Required | Description |
|---|---|---|---|
pattern | string | yes | Glob pattern matching Tool or Tool:first-arg-glob |
action | "allow" | "ask" | "deny" | yes | Decision when this rule matches |
comment | string | no | Human-readable note shown in /permissions |
reason | string | no | Deny reason surfaced to the operator and logged |
expires_at | ISO-8601 datetime | no | Rule is skipped after this timestamp |
See Permissions Concepts and Pattern Grammar for full detail.
Hooks
| Key | Type | Default | Description |
|---|---|---|---|
hooks | { event → handler[] } | {} | Hook event map. Keys are event names (e.g. "PreToolUse", "SessionEnd"); values are handler lists |
disable_all_hooks | bool | false | Kill-switch that disables every external hook handler. In-process hooks (permissions, audit) still run |
allow_managed_hooks_only | bool | false | When true, only hooks defined in the managed scope fire |
allowed_http_hook_urls | string[] | [] | Glob allowlist for HTTP hook endpoint URLs |
http_hook_allowed_env_vars | string[] | [] | Env-var names that HTTP hook handlers are allowed to read |
See Hooks for the full event list and handler shapes.
MCP Servers
mcp_servers is a map of server name to server configuration. Each entry deep-merges across scopes so a project scope can add environment variables to a user-scope server without redefining the whole entry.
[mcp_servers.linear]
command = "npx"
args = ["-y", "@linear/mcp-server"]
[mcp_servers.silverbullet]
type = "http"
url = "https://mcp.example.com/mcp"
headers = { Authorization = "Bearer ${SB_TOKEN}" }
| Field | Type | Default | Description |
|---|---|---|---|
type | "stdio" | "http" | "sse" | "stdio" | Transport. Also accepted as transport (legacy alias) |
command | string | "" | Executable (stdio only) |
args | string[] | [] | Argv after command (stdio only) |
env | { key → value } | {} | Environment variables (stdio only) |
cwd | string | — | Working directory override (stdio only) |
url | string | — | Absolute HTTP/HTTPS URL (http/sse only) |
headers | { key → value } | {} | Static request headers (http/sse only) |
oauth | "off" | "auto" | "manual" | "off" | OAuth mode (http/sse only) |
permissions.allow | string[] | [] | Per-server allow list (composed with global rules) |
permissions.deny | string[] | [] | Per-server deny list |
disabled | bool | false | Skip this server on startup |
See MCP Servers for configuration examples and the OAuth flow.
Router
| Key | Type | Default | Description |
|---|---|---|---|
router | object | — | Opaque config blob passed to caliban-model-router. The router crate owns the schema; see The Model Router |
Memory
| Key | Type | Default | Description |
|---|---|---|---|
memory | object | — | Memory tier knobs passed to caliban_memory::MemoryConfig. Sub-keys include auto_memory_enabled (bool), auto_memory_directory (string), cap_tokens_auto, cap_tokens_claude_md, cap_tokens_combined (integers) |
See Memory Tiers and CLAUDE.md & Imports.
Plugins
| Key | Type | Default | Description |
|---|---|---|---|
plugins | object | — | Plugin manager knobs. Schema is owned by the plugin subsystem; see Plugins |
UI / Output
| Key | Type | Default | Description |
|---|---|---|---|
output_style | string | "default" | Active output-style name. See Output Styles. Restart-required |
editor_mode | "vim" | "emacs" | — | Input-line editing mode |
view_mode | string | — | Compact vs. expanded TUI layout |
statusLine | object | — | Custom statusline command. Also accepted as status_line (TOML-friendly alias) |
tui | object | — | TUI theme and layout knobs (e.g. showCostInStatusline) |
statusLine sub-keys:
| Field | Type | Default | Description |
|---|---|---|---|
command | string | — | Shell command whose stdout is used as the statusline text. Required |
timeout_ms | integer | — | Per-invocation timeout in ms (50–5000) |
padding | integer | — | Horizontal padding cells (0–8) |
Authentication
| Key | Type | Default | Description |
|---|---|---|---|
api_key_helper | string, object, or object[] | — | Provider API-key supplier. Three shapes: bare command string; single { command, provider, refreshIntervalMs, slowHelperWarningMs } object; or array of provider-keyed objects. Executed without a shell; cached for refreshIntervalMs (default 5 min) or until a 401 is received |
Auth precedence per provider: per-provider helper → wildcard helper → environment variable → keyring → anonymous.
See Configuring Providers & API Keys.
Observability
| Key | Type | Default | Description |
|---|---|---|---|
enable_telemetry | bool | false | Enable OpenTelemetry / cost emitter |
See Telemetry & Cost.
Context-Window Management
| Key | Type | Default | Description |
|---|---|---|---|
auto_compact_threshold | float or null | 0.75 | Pre-turn auto-compaction threshold as a utilization fraction in [0, 1]. null disables auto-compact |
micro_compact_enabled | bool | true | Enable per-turn microcompact (LLM-free supersession pass) |
tool_result_cap_chars | integer | 50000 | Global per-tool-result character cap. 0 disables |
min_cache_block_tokens | integer | 1024 | Minimum estimated tokens on the last user message to place a conversation-level prompt-cache marker |
See Context & Compaction.
Managed Scope Control
| Key | Type | Default | Description |
|---|---|---|---|
parent_settings_behavior | "block" | "augment" | "augment" | When "block" is set in the managed scope, the managed scope moves to the top of the merge chain, overriding all user, project, local, and CLI settings. Has no effect when set in other scopes |
Miscellaneous
| Key | Type | Default | Description |
|---|---|---|---|
additional_directories | string[] | [] | Extra workspace roots for file and shell tools to consider |
claude_md_excludes | string[] | [] | Glob patterns for CLAUDE.md paths to skip during discovery |
env | { key → value } | {} | Environment-variable overrides applied to every child process launched by caliban (tools, hooks, MCP servers). Deep-merged across scopes; highest-priority scope wins per key |
Config Commands
Caliban ships two subcommand families for inspecting and managing settings: caliban config for the unified settings layer, and caliban settings for import/export of individual scope files. Both work without a running session.
caliban config print
Prints the fully-merged effective settings as JSON, annotated with the scope each value came from. Honors --settings and --setting-sources so you can preview what a CI run or a different scope combination would see.
caliban config print
# Show only project + user scopes (skip local)
caliban --setting-sources user,project config print
# Preview with a CLI overlay applied
caliban --settings '{"model": "claude-opus-4-7"}' config print
The output shows the merged Settings object. Each top-level key lists the scope that contributed the winning value. This is the headless equivalent of the read-only Effective tab in the /config TUI overlay.
caliban config migrate
Consolidates legacy per-feature TOML files (permissions.toml, mcp.toml, hooks.toml) in the current workspace into a single .caliban/settings.toml. Existing keys in the target file are preserved; the migrated keys are merged on top.
# Preview what would be written (nothing is changed)
caliban config migrate --dry-run
# Run the migration
caliban config migrate
After migration the per-feature files are no longer read (caliban checks for the unified key first). You can safely delete them, or leave them in place — caliban will ignore them once the corresponding key exists in settings.toml.
Run caliban config migrate once after upgrading to a version that shipped ADR 0026. It is safe to run multiple times — the command is idempotent.
caliban settings import
Imports a settings file from a foreign format (Claude Code JSON, Codex JSON, or legacy caliban JSON) into canonical caliban TOML at the target scope.
# Import ~/.claude.json into the user scope (dry-run first)
caliban settings import --from ~/.claude.json --scope user --dry-run
caliban settings import --from ~/.claude.json --scope user
# Import a project settings file into the project scope
caliban settings import --from /path/to/settings.json
Options:
| Flag | Description |
|---|---|
--from <PATH> | Path to the source file (required) |
--scope <SCOPE> | Destination scope: managed, user, project, or local. Default: project |
--dry-run | Print what would be written without making changes |
caliban settings import is the recommended migration path when you have an existing Claude Code settings.json you want to adopt. The source file is read-only; only the target scope's TOML is written.
caliban settings print
Prints the raw settings for a single scope (before merging), or the merged effective settings when no scope is specified.
# Print the project-scope settings
caliban settings print
# Print the user-scope settings
caliban settings print --scope user
Options:
| Flag | Description |
|---|---|
--scope <SCOPE> | Scope to print. Default: project |
This differs from caliban config print in that it shows the unmerged raw contents of one scope rather than the merged result across all scopes.
TOML-primary write / JSON import-only
Caliban always writes TOML. JSON files at any scope path are accepted on read as a legacy or import path, but caliban logs a WARN and recommends running caliban settings import to migrate.
When both settings.toml and settings.json exist in the same scope directory, TOML wins and the JSON file is ignored (with a WARN).
If caliban finds both settings.toml and settings.json in the same scope directory it will silently ignore the .json file. Keep one format per scope directory.
Live reload and restart-required keys
Most settings changes take effect immediately via the file watcher (250 ms debounce). A subset of keys require a full restart:
- Restart-required:
model,fallback_model,mcp_servers.*,output_style,auto_compact_threshold,micro_compact_enabled
When a restart-required key changes on disk while caliban is running, caliban logs a WARN and shows a "restart required" badge in the /config TUI overlay. The new value will be used the next time you launch caliban.
All other settings — permissions, hooks, api_key_helper, UI keys, env, memory knobs — are live-reloadable and take effect within one debounce cycle without restarting.
Concepts
Every tool call the model makes — Bash, Write, Edit, a fetched URL, an MCP action — passes through the permission system before it executes. The system is a flat list of rules evaluated from top to bottom; the first rule that matches determines the outcome.
Actions
Each rule maps a pattern to one of three actions:
| Action | Meaning |
|---|---|
allow | Execute the tool call immediately, no prompt. |
deny | Reject the tool call. An optional reason string is returned to the model so it can retry differently. |
ask | Pause and ask the operator interactively. In headless mode without --auto-allow, ask degrades to a hard deny. |
Rule structure
A rule is a TOML table with a pattern and an action, plus optional metadata:
[[permissions.rules]]
pattern = "Bash:git *" # required — see Pattern Grammar
action = "allow" # required — allow | deny | ask
comment = "git ops are fine" # optional — shown in the Ask modal, not to the model
reason = "…" # optional, deny-only — returned to the model
expires_at = "2027-01-01T00:00:00Z" # reserved, parsed but not yet enforced
Evaluation order
Rules are evaluated top-to-bottom; first match wins.
Sources are merged in priority order before evaluation, so high-priority sources simply appear earlier in the flat list:
- CLI flags (
--allow,--deny,--ask) — highest priority; prepended at startup. - Project file —
<workspace>/.caliban/permissions.toml. - User file —
$XDG_CONFIG_HOME/caliban/permissions.toml(default:~/.config/caliban/permissions.toml). - Built-in defaults — lowest priority; appended automatically.
Within a single file, rules are ordered exactly as written. The [[permissions.rules]] array preserves authoring order, so narrow rules belong above broader ones.
Older configs used a permissions.{allow,ask,deny} key per action rather than an ordered array. Caliban still loads that format on read and normalizes it into the ordered array, but all caliban-owned writes (the Ask modal, /permissions, caliban perms add) emit the canonical [[permissions.rules]] form. Convert your config with caliban perms export --format toml.
Built-in defaults
When no rule matches a tool call before the end of the list, the built-in defaults serve as a safety net:
| Pattern | Default action |
|---|---|
Read | allow |
Grep | allow |
Glob | allow |
TodoWrite | allow |
EnterPlanMode | allow |
ExitPlanMode | allow |
WebFetch | ask |
Bash | ask |
Write | ask |
Edit | ask |
* (catch-all) | ask |
Unknown tools (MCP tools, future built-ins) fall through to the * catch-all and are ask by default.
Decision flow
flowchart TD
A([Tool call arrives]) --> B{Runtime rules\nany match?}
B -- yes --> Z
B -- no --> C{CLI rules\n--allow/--deny/--ask}
C -- match --> Z
C -- no match --> D{Project rules\npermissions.toml}
D -- match --> Z
D -- no match --> E{User rules\n~/.config/caliban/}
E -- match --> Z
E -- no match --> F{Built-in defaults}
F -- match --> Z
Z[Matched rule action] --> G{Action?}
G -- allow --> H([Execute tool])
G -- deny --> I([Reject — return reason to model])
G -- ask --> J{Interactive session?}
J -- yes --> K([Show Ask modal])
J -- no --> L{--auto-allow?}
L -- yes --> H
L -- no --> I
The Permission Mode wraps this pipeline and can override the ask verdict (but never a static allow or deny). See Permission Modes for details.
Pattern Grammar
A pattern is the pattern field in a [[permissions.rules]] entry (or the argument to --allow/--deny/--ask on the CLI). It encodes the tool name and an optional argument specifier separated by a colon.
Forms at a glance
| Form | Description |
|---|---|
Tool | Match any invocation of Tool, regardless of arguments. |
Tool:<glob> | Match Tool when its first argument matches <glob>. |
Bash:~<glob> | Match Bash when <glob> appears anywhere in the command string. |
Tool:key=<glob> | Match Tool when the named input field matches <glob> (dotted keys supported). |
Tool:k1=<g1>,k2=<g2> | Multiple key=glob pairs, AND-combined. |
* | Catch-all — matches every tool. |
Glob characters
The argument-side glob uses globset semantics:
| Character | Meaning |
|---|---|
* | Zero or more characters (does not cross / in path patterns). |
** | Zero or more path segments (crosses /; use in file-edit patterns). |
? | Exactly one character. |
Non-path patterns (Bash command strings, URLs, MCP string fields) use literal_separator = false, so * matches slashes too.
Tool:<glob> — first-argument matching
The "first arg" is a per-tool field extracted from the JSON input:
| Tool | First-arg field |
|---|---|
Bash | command |
Read, Write, Edit, MultiEdit, NotebookEdit | path |
WebFetch | url |
| MCP tools with no known accessor | (no first arg; pattern can't match) |
If the tool has no known accessor, only the bare Tool form can match; Tool:<glob> never fires for that tool.
Bash:~<glob> — anywhere-in-command match
Prefix the argument glob with ~ to perform a sliding-window search over the full command string rather than matching from the start. This catches commands invoked via wrappers or subshells:
# Deny any use of rm, even via sudo or bash -c "rm …"
[[permissions.rules]]
pattern = "Bash:~rm *"
action = "deny"
reason = "no rm — use git revert or Write"
The ~ prefix is only meaningful for Bash. On other tools it does not match.
Tool:key=<glob> — structured (dotted-key) matching
For MCP tools or built-ins whose input has named fields, use key=glob to match a specific field. Dots traverse nested objects:
# Allow creating GitHub issues only in the anthropic org
[[permissions.rules]]
pattern = "mcp__github__create_issue:repo=anthropic/*"
action = "allow"
# AND-combined: repo must match AND title must start with "feat"
[[permissions.rules]]
pattern = "mcp__github__create_issue:repo=anthropic/*,title=feat*"
action = "allow"
File-edit path normalization
For Read, Write, Edit, MultiEdit, and NotebookEdit, the file path in the tool call is workspace-normalized before pattern matching:
- Absolute paths are used as-is.
- Relative paths are resolved against the workspace root (the
git rev-parse --show-toplevelresult, or the current working directory when outside a repo). - A relative pattern like
src/**/*.rsis automatically anchored with**/so it matches at any depth under the repo.
# Allow editing any Markdown file anywhere in the repo
[[permissions.rules]]
pattern = "Edit:**/*.md"
action = "allow"
# Allow editing files only in a specific directory (absolute path)
[[permissions.rules]]
pattern = "Write:/tmp/*"
action = "allow"
Examples table
| Pattern | Matches | Does not match |
|---|---|---|
Bash | Any Bash call | — |
Bash:git * | git push, git commit -m "…" | gitk, sudo git push |
Bash:~git * | sudo git push, bash -c "git fetch" | commands with no git substring |
Bash:rm * | rm -rf /tmp | sudo rm -rf /tmp (use ~rm * for that) |
Edit:**/*.rs | /repo/src/main.rs, /repo/crates/x/lib.rs | /tmp/scratch.py |
Write:/tmp/* | /tmp/out.txt | /home/user/file.txt |
WebFetch:https://docs.* | https://docs.rs/…, https://docs.anthropic.com/… | https://api.example.com/… |
mcp__gh__create_issue:repo=acme/* | {"repo":"acme/frontend"} | {"repo":"other/repo"} |
* | Every tool | — |
MCP tools that declare no known first-arg accessor can only be matched by their full name (mcp__server__tool_name) or the * catch-all. A pattern like mcp__server__tool_name:<glob> will never fire for such tools because there is no field to extract.
Permission Modes
Permission modes control what happens when the rule evaluator produces an ask verdict. They do not override a static allow or deny — those always win. A mode is just a post-pass filter on top of the rule pipeline.
The six modes
| Mode | camelCase | What changes |
|---|---|---|
| Default | default | Rules apply unchanged; ask routes to the interactive Ask modal. |
| Accept Edits | acceptEdits | Write, Edit, MultiEdit, and NotebookEdit are auto-allowed; all other tools honor rules normally. |
| Plan | plan | Read-only tools are allowed; write and execute tools are blocked from the loop (legacy plan-mode allowlist). |
| Auto | auto | A fast classifier model labels each ask-rule tool call as allow / soft-deny / hard-deny. Soft-deny routes to the Ask modal with the classifier's reason. |
| Don't Ask | dontAsk | Every ask verdict becomes allow. Static deny rules still apply. |
| Bypass Permissions | bypassPermissions | All rules ignored — every tool call is allowed. Requires an explicit confirmation flag (see below). |
The status bar shows a chip when the active mode is not default:
| Mode | Chip |
|---|---|
acceptEdits | ✎ accept edits |
plan | 📋 plan |
auto | 🤖 auto |
dontAsk | ⏭ don't ask |
bypassPermissions | ⚠ bypass |
Cycling modes with Shift+Tab
In the interactive TUI, press Shift+Tab to cycle forward through the modes:
default → acceptEdits → plan → auto → dontAsk → bypassPermissions → default
Cycling into bypassPermissions without the confirmation flag (see below) fires a warning toast and snaps back to default.
Setting the mode at startup
Use --permission-mode on the command line:
caliban --permission-mode acceptEdits "add docstrings to all public functions"
Valid values are the camelCase mode names: default, acceptEdits, plan, auto, dontAsk, bypassPermissions.
The mode is also resolved from the environment variable CALIBAN_DEFAULT_PERMISSION_MODE and the permissions.default_mode setting (see below), with this precedence:
--permission-modeCLI flagCALIBAN_DEFAULT_PERMISSION_MODEenv varpermissions.default_modein settings- Built-in default (
default)
The default_mode setting
Set a persistent default mode in your project or user settings file:
[permissions]
default_mode = "acceptEdits"
This is overridden by the CLI flag and env var as shown above.
Auto-mode and --disable-auto-mode
When the mode is auto, the classifier is consulted for each tool call whose rule verdict is ask. The classifier dispatches via the router's FastClassifier purpose — configure it to use a small, fast model (e.g., Haiku, GPT-4o-mini, a local Ollama model). Results are cached for the session by (tool_name, sha256(input)).
To disable the classifier (all ask verdicts stay as-is, routing to the modal), pass:
caliban --disable-auto-mode
or set CALIBAN_DISABLE_AUTO_MODE=1. When disabled, auto mode behaves identically to default.
Bypass permissions latch
bypassPermissions overrides all rules, including static deny. Because this is a footgun, caliban refuses to enter the mode without an explicit confirmation flag:
caliban --allow-dangerously-skip-permissions --permission-mode bypassPermissions
Without --allow-dangerously-skip-permissions:
- Starting with
--permission-mode bypassPermissionsaborts at startup with an error. - Configuring
permissions.default_mode = "bypassPermissions"also aborts at startup. - Cycling to
bypassPermissionsvia Shift+Tab fires a warning toast and reverts todefault.
In bypass mode the model can execute any tool call without restriction. Use it only in fully sandboxed, disposable environments where you control the entire execution context. Prefer dontAsk or acceptEdits for typical automation.
--no-permissions
--no-permissions disables the permission system entirely — no rules are evaluated and every tool call is allowed. It conflicts with --allow, --deny, --ask, and --auto-allow. The resolved mode surfaces as "disabled" in the system/init stream-json frame.
Managing Rules
Rules can be created and edited through three surfaces: the interactive Ask modal that appears when a tool call reaches an ask verdict, the /permissions overlay inside the TUI, and the caliban perms CLI for scripted or headless management.
The Ask modal
When a tool call hits an ask verdict during an interactive session, the TUI pauses and presents a modal with four choices (navigate with arrow keys, confirm with Enter):
| Choice | Effect |
|---|---|
| Allow once | Permit this specific tool call and continue; no rule is written. |
| Always allow | Permit this call and append a new allow rule to the chosen scope file. |
| Reject once | Deny this specific call; no rule is written. |
| Always reject | Deny this call and append a new deny rule to the chosen scope file. |
Press Esc to dismiss the modal and deny the current call without writing any rule.
When you choose "Always allow" or "Always reject", caliban opens a sub-prompt with a suggested narrow pattern (e.g., Bash:git push rather than Bash), a scope picker (project / user), and an optional comment field. The rule is atomically appended to the appropriate TOML file and takes effect immediately for the rest of the session.
The /permissions overlay
Type /permissions in the TUI input bar to open the interactive permissions overlay. It shows:
- The full effective rule list (runtime rules, then config rules by scope, then built-in defaults), each tagged with its origin.
- Runtime-only rules added by "Always allow/reject" during this session.
- Keybind
ddeletes the selected rule: a session rule is dropped from the live store immediately; a file-scoped rule is removed from its TOML file; built-in defaults are read-only.
Use the overlay to inspect the live rule list and verify which rule would match a given tool call before running it.
Adding a rule through the Ask modal applies to the running session immediately — the next matching tool call won't re-prompt. Removing a file-scoped rule (via the overlay's d key or caliban perms remove), or editing a permissions.toml outside caliban, does not retroactively change the current session's decisions; those changes take effect at the next session start. Deleting a session rule with d is the exception — it takes effect live.
caliban perms CLI
The caliban perms subcommand provides a complete headless management surface. All verbs accept an optional --scope flag (managed | user | project | local | cli; defaults vary by verb).
list — show rules
# Show the effective merged rule list across all scopes
caliban perms list --effective
# Show only project-scope rules in JSON
caliban perms list --scope project --json
Output (human-readable): 1 allow Bash:git *
test — check a tool call
Returns exit code 0 (allow), 1 (deny), or 2 (ask) so it's scriptable.
# Would `git push` be allowed?
caliban perms test Bash '{"command":"git push"}'
# MATCH: pattern=Bash:git * action=allow
# Would rm be allowed?
caliban perms test Bash '{"command":"rm -rf /tmp"}'
# MATCH: pattern=Bash:rm * action=deny
explain — show the full match walk
Prints every rule in evaluation order with a MATCH marker next to the first rule that fires. Useful for diagnosing unexpected allow/deny outcomes.
caliban perms explain Bash '{"command":"sudo rm -rf /"}'
# Rule list (source order; first match wins):
# 1 allow Bash:git *
# 2 MATCH deny Bash:~rm *
# 3 ask Bash
# ...
add — append a rule
Appends a rule to the target scope file (default: project).
# Allow all cargo commands at project scope
caliban perms add "Bash:cargo *" allow --comment "cargo is safe"
# Deny curl at user scope with a reason for the model
caliban perms add "Bash:curl *" deny --scope user --reason "use WebFetch instead"
remove — delete a rule
Remove by exact pattern match. Index-based removal is reserved for a future release.
caliban perms remove --pattern "Bash:cargo *" --scope project
import — import rules from another config
Import rules from a Claude Code settings.json, a legacy caliban JSON file, or a foreign TOML. Defaults to user scope.
# Dry-run first
caliban perms import --from ~/.claude/settings.json --dry-run
# Actually import into user scope
caliban perms import --from ~/.claude/settings.json --scope user
export — export rules to stdout
Outputs the current scope's rules in TOML (default) or JSON format, suitable for redirecting into a new file or piping to another tool.
# Export project rules as TOML
caliban perms export --scope project
# Export as JSON (three-bucket format for interop)
caliban perms export --scope project --format json
audit — inspect the decision log
Reads the JSONL audit log and prints matching entries. See Headless & Audit for log location and rotation details.
# Show all deny decisions in the last hour
caliban perms audit --action deny --since 2026-06-01T00:00:00Z
# Show the 20 most recent decisions for the Write tool
caliban perms audit --tool Write --head 20
lint — check for duplicate rules
Scans a scope's rule list for duplicate (pattern, action) pairs and prints them. Exits 0 if clean, 1 if duplicates are found.
caliban perms lint --scope project
# OK (no duplicate patterns)
caliban perms lint --scope user
# duplicate: pattern="Bash:git *" action=allow
Rules are read from managed → user → project → local (earlier scopes shadow later). The caliban perms add default scope is project; caliban perms import defaults to user. Use --scope to override.
Headless & Audit
Headless mode and the ask verdict
When caliban runs without a TTY — in CI, in a script, or via caliban -p — there is no interactive modal to present. Any tool call that reaches an ask verdict is handled by NonInteractiveAskHandler:
- Default behavior (no flags):
askbecomes a hard deny. The tool call fails with a permission error message that names a concrete remediation. - With
--auto-allow: everyaskverdict becomesallow. This is equivalent todontAskmode for the duration of the run.
The deny message is tailored to the tool class:
| Tool class | Suggested remediation |
|---|---|
File-edit (Write, Edit, MultiEdit, NotebookEdit) | --permission-mode acceptEdits or a narrow --allow rule |
Bash | --allow 'Bash(<glob>)' for a targeted rule, or --auto-allow (flagged dangerous) |
| Other tools | --allow '<Tool>' or --auto-allow |
Opt-in strategies
Choose the least-permissive option that satisfies the task:
# Allow only file edits (most common CI use case)
caliban -p "update version in Cargo.toml" --permission-mode acceptEdits
# Allow specific git commands
caliban -p "commit and push" --allow "Bash:git *"
# Allow all ask-rule tools (use with care)
caliban -p "run the full refactor" --auto-allow
You can also set rules in the project's permissions.toml so they apply without CLI flags:
[[permissions.rules]]
pattern = "Bash:git *"
action = "allow"
comment = "safe for CI"
The JSONL audit log
Every tool-call decision (allow, deny, or ask) is appended to an append-only JSONL file.
Log location
| Platform | Path |
|---|---|
| Linux | $XDG_STATE_HOME/caliban/permission-decisions.jsonl (default: ~/.local/state/caliban/) |
| macOS | $XDG_DATA_HOME/caliban/permission-decisions.jsonl (default: ~/Library/Application Support/caliban/) |
The audit_log setting controls whether logging is active:
[permissions]
audit_log = true # default; set false to disable
Log format
Each line is a JSON object:
{
"ts": "2026-06-01T14:23:01.123456Z",
"session_id": "s_abc123",
"turn_index": 4,
"tool_use_id": "tu_xyz",
"tool_name": "Bash",
"input_excerpt": "{\"command\":\"git push origin main\"}",
"action": "allow",
"matched_rule": {
"pattern": "Bash:git *",
"action": "allow"
}
}
input_excerpt is truncated to 256 characters and newlines are replaced with spaces.
Log rotation
When the log file exceeds 100 MiB, caliban automatically:
- Renames the current file to
permission-decisions-YYYY-MM-DD.jsonl. - Gzip-compresses the renamed file to
permission-decisions-YYYY-MM-DD.jsonl.gz. - Removes the uncompressed renamed file.
- Opens a fresh
permission-decisions.jsonlfor subsequent writes.
Rotated archives accumulate in the same directory. Remove old .gz files manually when disk space is a concern.
Querying the log
Use caliban perms audit to filter and display log entries:
# All decisions since midnight UTC
caliban perms audit --since 2026-06-01T00:00:00Z
# Only denials for the Write tool
caliban perms audit --tool Write --action deny
# Most recent 50 entries
caliban perms audit --head 50
# Combine filters
caliban perms audit --tool Bash --action allow --since 2026-06-01T00:00:00Z --head 100
Exit code is always 0; an empty result prints (empty).
Hardening with permissions.enforce
The permissions.enforce flag prevents the bypass latch from being used, even when --allow-dangerously-skip-permissions is passed:
[permissions]
enforce = true
With enforce = true, caliban refuses to start if --allow-dangerously-skip-permissions is on the command line or if permissions.default_mode is set to bypassPermissions. This is useful for team or managed deployments where operators want to guarantee that static deny rules can never be overridden.
Set permissions.enforce = true in the managed or user scope, not project scope, so it cannot be overridden by project-level config. A project can always set a lower-priority rule, but only higher-priority scopes can lock out bypass.
Built-in Tools
Caliban ships a fixed set of built-in tools that cover the most common agentic tasks: reading and writing files, executing shell commands, searching code, fetching web content, and coordinating work. Every tool is permission-gated (see Permissions) and subject to the execution policies described in Tool Execution.
Pass --no-tools to disable all tools and run caliban in chat-only mode.
Tool reference
| Tool | Category | Purpose |
|---|---|---|
Read | Filesystem | Read a UTF-8 file, with optional offset + limit for pagination. Files larger than 5 MB must be read in chunks. |
Write | Filesystem | Write content to a file, creating missing parent directories. Overwrites existing content. |
Edit | Filesystem | Replace occurrences of old_string with new_string in a file. Expects exactly one match by default; set replace_all=true to replace all. |
MultiEdit | Filesystem | Apply a sequence of {old_string, new_string} replacements to a single file atomically. If any replacement fails to match, the whole operation is rolled back. |
NotebookEdit | Filesystem | Add, edit, or delete cells in a Jupyter .ipynb notebook (nbformat v4). Preserves cell metadata and outputs; writes atomically via tmpfile + rename. |
Bash | Shell | Run a shell command and capture stdout + stderr. Supports timeout_seconds, an optional cwd, and a background flag for long-running processes. |
BashBg | Shell | Companion tools for background Bash jobs: read buffered output (BashOutput) or terminate a job (KillShell). Background jobs use a 5 GiB ring buffer. |
Glob | Search | Find files by name pattern relative to the workspace root. |
Grep | Search | Search file contents with a regex, powered by the ripgrep library. Returns up to 100 matches by default (max 500). |
WebFetch | Web | GET a URL and return the body as markdown or plain text. HTML is converted via htmd. 10 MB body cap, 60 s default timeout (configurable up to 300 s). |
WebSearch | Web | Query a web search API and return ranked results. See backend details below. |
TodoWrite | Agent | Replace the session's shared task list with a new list of {id, content, status} items. The list is re-injected into the system prompt each turn. Max 100 items. |
AgentTool | Agent | Spawn an in-process sub-agent with a task prompt and an optional tool allowlist. Output is capped at 5,000 characters. See Sub-agents. |
EnterPlanMode | Plan | Switch the session into plan mode. While active, only read-only tools may run; destructive tools are blocked until the operator confirms the plan. |
ExitPlanMode | Plan | Confirm or abandon the current plan and return to normal execution. |
ReadMemoryTopic | Memory | Read one auto-memory topic file by slug. See Memory Tiers. |
WriteMemoryTopic | Memory | Write or update an auto-memory topic file and update the MEMORY.md index entry atomically. Topic type must be one of user, feedback, project, or reference. See Memory Tiers. |
WebSearch backends
WebSearch delegates to one of three search APIs, selected by the CALIBAN_WEBSEARCH_PROVIDER environment variable:
| Value | API key env var | Default? |
|---|---|---|
brave | BRAVE_API_KEY | Yes |
tavily | TAVILY_API_KEY | No |
exa | EXA_API_KEY | No |
If the selected provider's API key is missing, the tool returns a structured error naming the missing variable so the agent can try a different approach rather than failing silently.
--no-tools disables all built-in tools (and MCP tools) for the session.
This is useful when you want a pure conversation without any side effects —
for example, drafting a message or brainstorming before running anything.
Filesystem tool conflict resolution
Edit, Write, MultiEdit, NotebookEdit, and WriteMemoryTopic all declare a conflict key based on their target path (or memory slug). When the model emits two write operations targeting the same file in a single turn, caliban serializes those calls in submission order rather than letting them interleave. Calls targeting different files still execute in parallel. See Tool Execution for details.
Tool Execution
This page covers how caliban resolves file paths for tools, dispatches multiple tools concurrently within a turn, caps oversized tool output, and controls verbosity.
Path resolution and workspace
Every filesystem and shell tool resolves paths relative to the workspace root — the directory caliban was started in, unless overridden.
--workspace <DIR> — Set the workspace root explicitly. Relative tool paths are joined against this directory. If not supplied, caliban uses the current working directory.
--restrict-paths — Reject any tool call whose path resolves outside the workspace root. With this flag, absolute paths that escape the workspace return an error rather than silently accessing arbitrary filesystem locations. Use it when you want a hard containment boundary at the path-resolution layer.
additional_directories — A list of extra root paths declared in settings.toml. Tools may read and write these paths even when --restrict-paths is active, as long as the path falls under one of the declared roots.
# .caliban/settings.toml
additional_directories = [
"/data/shared",
"/home/user/docs",
]
--restrict-paths enforces containment at the Rust level before a process is spawned.
The OS Sandbox enforces containment at the OS level inside the subprocess.
They are independent and complementary: use both for defense-in-depth.
Parallel tool dispatch
When the model emits multiple tool_use blocks in a single assistant turn, caliban runs them concurrently by default (ADR 0016).
flowchart LR
A[Model turn\nmulti-tool call] --> B[Permission gate\nserial]
B -->|Denied| C[Deny result\nto model]
B -->|Allowed| D[FuturesUnordered\nbounded by semaphore]
D --> E[Results in\ncompletion order]
E --> F[Re-ordered to\nsubmission order\nin history]
Permission hooks run serially first. Each before_tool hook fires in submission order, producing an Allowed or Denied decision before any concurrent execution begins. Denied results are returned to the model immediately.
Allowed calls fan out into a FuturesUnordered pool bounded by a semaphore.
Default concurrency limit — available_parallelism() − 1 (minimum 1). This leaves one CPU core for the agent loop, streaming, and the TUI render thread. Most tools are I/O-bound, so the limit is a soft ceiling against runaway fan-out rather than a strict CPU cap.
Override flags:
| Flag / env var | Effect |
|---|---|
--no-parallel-tools / CALIBAN_NO_PARALLEL_TOOLS=1 | Run all tools serially (equivalent to a limit of 1). |
--parallel-tool-limit N / CALIBAN_PARALLEL_TOOL_LIMIT=N | Set the concurrency limit explicitly. |
Write conflict serialization. Tools that write to the same target (Edit, Write, MultiEdit, NotebookEdit, WriteMemoryTopic) declare a conflict key. Two calls with the same key are serialized in submission order even when the concurrency limit would permit them to run together. Calls with different keys (or no key) still parallelize freely.
Tool-result capping
Large tool results — for example, reading a multi-thousand-line file — can fill the context window quickly. Caliban caps each tool result before it is appended to the conversation.
tool_result_cap_chars (settings key, default 50000) — Maximum character count for a single tool result delivered inline to the model. Set to 0 to disable capping.
When a result exceeds the cap, the overflow text is written to a spill file under the caliban cache directory (~/.cache/caliban/tool-overflows/ on Linux, ~/Library/Caches/caliban/tool-overflows/ on macOS) and the model receives a truncated result with a note pointing at the spill path. The model can then decide whether to read the spill file directly.
# .caliban/settings.toml — raise the cap for large codebases
tool_result_cap_chars = 100_000
Suppressing tool announcements
By default, caliban prints a line to the terminal each time it invokes a tool, showing the tool name and its primary argument. Pass --quiet to suppress these announcements. Error output from tools is never suppressed.
# Silent execution — no "Running Bash: cargo test" lines
caliban --quiet -p "run the test suite and summarise failures"
The OS Sandbox
Caliban can wrap every subprocess spawned by the Bash tool in an OS-level sandbox that restricts what the child process may do — independent of permission rules. Where permission rules decide whether a command runs, the sandbox controls what it can access once it does.
The sandbox is implemented by the caliban-sandbox crate (ADR 0032). It is disabled by default and must be explicitly enabled in settings.
Platform support
| Platform | Backend | Status |
|---|---|---|
| macOS | Apple Seatbelt (sandbox-exec) | Supported |
| Linux / WSL | bubblewrap (bwrap >= 0.5) | Supported |
| Windows native | — | Not supported in v1; use WSL for the bubblewrap backend |
Apple has deprecated the sandbox-exec / Seatbelt API. It still ships in all
current macOS releases, but caliban's macOS backend will need to move to the
Endpoint Security Framework if Apple removes sandbox-exec in a future OS
version. There is no announced removal date.
Enabling the sandbox
Add a [sandbox] block to your project or user settings.toml:
[sandbox]
enabled = true
fail_if_unavailable = true # refuse to start if bwrap/sandbox-exec is missing
With fail_if_unavailable = false (the default), caliban falls back to running unsandboxed if the backend binary is absent or too old, and logs a warning.
What the sandbox restricts
The sandbox limits three classes of access for spawned subprocesses:
Filesystem
| Key | Effect |
|---|---|
filesystem.allow_read | Paths the subprocess may read |
filesystem.deny_read | Paths hidden from reads (shadows an allow_read entry) |
filesystem.allow_write | Paths the subprocess may write |
filesystem.deny_write | Paths write-denied within an allow_write root |
On Linux, denied paths are masked with --tmpfs (an empty in-memory directory shadows the real one). On macOS, Seatbelt uses (deny file-write* (subpath …)) rules. Glob patterns are not supported in filesystem ACLs — add explicit path roots.
The environment variables ${WORKSPACE}, ${HOME}, and the XDG vars are expanded when the sandbox is initialized.
Network
Per-hostname egress is not reliably enforceable by either backend alone. The supported patterns are:
- Block all egress — leave
network.allowed_domainsempty. Uses--unshare-neton Linux and omits allnetwork-outboundallow rules on macOS. - Proxy-filtered egress — set
network.http_proxy_portto route subprocess HTTP through an operator-run proxy at127.0.0.1:<port>. The proxy enforces domain rules; the sandbox only allows the loopback port.
If you set allowed_domains to a non-empty list on Linux without also
configuring http_proxy_port, caliban logs a warning: the Linux bubblewrap
backend cannot enforce per-hostname rules without a proxy layer.
macOS Seatbelt supports literal (remote tcp "host:port") rules and is
correspondingly stricter.
Other network settings
[sandbox.network]
allow_unix_sockets = false # Docker daemon socket, etc.
allow_local_binding = false # bind() on local ports
allow_mach_lookup = [] # macOS-only: Mach service names
Full configuration example
[sandbox]
enabled = true
fail_if_unavailable = true
auto_allow_bash_if_sandboxed = true
allow_unsandboxed_commands = ["git", "gh"]
enable_weaker_nested_sandbox = false
[sandbox.filesystem]
allow_read = ["${WORKSPACE}", "/etc", "/usr"]
deny_read = ["${HOME}/.ssh"]
allow_write = ["${WORKSPACE}"]
deny_write = ["${WORKSPACE}/.git/hooks"]
[sandbox.network]
http_proxy_port = 8888
allow_unix_sockets = false
allow_local_binding = false
Key settings
auto_allow_bash_if_sandboxed — When both enabled and this flag are true, the permission classifier auto-allows all Bash(*) calls without showing a prompt. The sandbox is the protection; the Ask modal becomes redundant. Defaults to false. Note: commands listed in allow_unsandboxed_commands are not auto-allowed — they run outside the sandbox and still go through normal permission rules.
allow_unsandboxed_commands — A glob list matched against the first token of each command (or the full command string when the pattern contains a space). Matching commands bypass the sandbox entirely. Use this for tools that genuinely need unrestricted access — for example, git or gh.
enable_weaker_nested_sandbox — For dev containers or VMs that are already inside a user namespace: drops the --unshare-user flag on Linux (which would otherwise fail). This is a no-op on macOS.
bwrap_path / sandbox_exec_path — Override the path to the sandbox binary if it is not at the default location ($PATH for bwrap; /usr/bin/sandbox-exec for macOS).
How it works
SandboxedShim::wrap_command intercepts the tokio::process::Command built by BashTool before it is spawned. If the sandbox is active and the command is not on the bypass list, it rewrites the command so that:
- On macOS:
sandbox-exec -f <profile.sb> <original command> - On Linux:
bwrap [bind/ro-bind/tmpfs flags] <original command>
The rest of the Bash tool — stdout/stderr capture, PID-group cleanup, timeouts, cancellation — is unchanged. The sandbox is a shim layer, not a fork.
Detection runs at startup. bwrap version >= 0.5 is required on Linux (the --die-with-parent flag arrived in 0.5).
Skills
Skills are reusable instruction packages that the model loads on demand. Each skill is a markdown file with YAML frontmatter — the same format as the Anthropic "superpowers" plugin ecosystem, so existing skills port without changes.
Skills are not executed; they inject text into the model's context. A skill can describe a workflow, a style guide, a debugging procedure, or any other multi-step process. Only the description line is always visible to the model; the full body is fetched lazily when the model calls the Skill tool.
How the Skill tool works
Caliban registers a single built-in tool named Skill. Its description lists every loaded skill by name and one-line description. When the model wants to follow a skill's instructions, it calls Skill with the skill's exact name; the harness returns the body as text and the model proceeds accordingly.
This design keeps the token cost bounded: descriptions are always present, bodies are pay-per-use.
Discovery roots
Caliban scans three roots in priority order. The first match for a given name wins; later roots are shadowed.
| Priority | Location | Scope |
|---|---|---|
| 1 (highest) | <workspace>/.caliban/skills/ | Project |
| 2 | ~/.config/caliban/skills/ (XDG-aware) | User |
| 3 | ~/.local/share/caliban/plugins/*/skills/ | Plugin-managed |
A project-level skill with the same name as a user-level skill silently replaces it. Malformed SKILL.md files are logged at warn and skipped — loading is best-effort.
Skill file format
Each skill lives in its own subdirectory. The directory name must match the name: frontmatter field exactly.
.caliban/skills/
my-workflow/
SKILL.md
SKILL.md structure:
---
name: my-workflow
description: "One-line summary shown to the model in the Skill tool description."
metadata:
trigger: pre-implementation # free-form; passed through unchanged
---
# My Workflow
Full markdown instruction set. Only loaded when the model calls Skill({"name": "my-workflow"}).
Required frontmatter fields: name and description. The metadata map is optional.
Built-in skills
Caliban ships one built-in skill compiled into the binary:
| Name | Purpose |
|---|---|
auto-memory | Protocol for reading and writing the auto-memory tiers |
Built-ins register before the directory scan, so a user or project skill with the same name will shadow them.
Disabling skills
| Method | Effect |
|---|---|
--no-skills flag | Disables the Skill tool entirely; no skills are loaded |
CALIBAN_NO_SKILLS=1 | Same, via environment variable |
To override the built-in auto-memory skill, place your own auto-memory/SKILL.md in .caliban/skills/. It will take priority over the embedded version without any additional configuration.
Related pages
- Plugins — bundle skills alongside hooks, MCP servers, and output styles
- Slash Command Index —
/skillsoverlay shows loaded skills
Custom Slash Commands
Caliban's slash commands are managed through a central SlashCommandRegistry. Every command — whether built-in or plugin-supplied — registers in the same registry, which drives typeahead completion, the /help listing, and dispatch.
The built-in registry (ADR 0040)
At startup, caliban registers approximately 30 built-in slash commands covering session management, context control, configuration, and diagnostics. The registry is the canonical source of truth for what commands exist; /help enumerates the live set.
flowchart LR
Input["/ input"] --> Typeahead["Typeahead suggester"]
Input --> Dispatch["Registry dispatch"]
Dispatch --> Command["SlashCommand impl"]
Command --> SlashCtx["SlashCtx (session + registries)"]
Each command receives a SlashCtx containing the running session, provider, MCP manager, skills registry, hooks, and settings — everything it might need without requiring each command to thread individual dependencies through its call signature.
Full built-in command list
See the Slash Command Index for the authoritative list with descriptions and arguments.
Key commands relevant to the extending cluster:
| Command | Purpose |
|---|---|
/skills | Show loaded skills and their descriptions |
/mcp | Show MCP server status (connected / failed / disabled) |
/hooks | Show active hook handlers |
/plugins | List installed plugins with enable/disable status |
/config | Interactive settings editor |
/output-style | Pick an output style |
Plugin-supplied commands
Plugins (ADR 0030) may register additional slash commands by placing command markdown files in their commands/ subdirectory. The plugin system feeds these into the registry at startup using the same SlashCommand trait. Plugin-supplied commands are namespaced <plugin>:<command> so they cannot shadow built-ins by accident.
The ability for end-users to drop custom slash command files into .caliban/commands/ or ~/.config/caliban/commands/ (outside of a plugin) is planned but not yet wired. The ComponentSpec.commands field is reserved in the plugin manifest schema and the registry has the extension point, but standalone user-defined command files are not yet discovered at startup. Track progress against ADR 0040 and the parity matrix row M.
Until this lands, the recommended path for reusable operator-defined procedures is a Skill, which supports the same markdown body format and is already fully discoverable.
Hook on slash submission
UserPromptSubmit fires before the slash parser runs. The hook payload includes is_slash: true, command, and args. A hook can reject or rewrite a slash command — useful for audit logging or per-operator policy enforcement.
Related pages
Hooks
Hooks let you attach external logic to caliban's event stream — shell scripts, HTTP callbacks, or MCP tools — without modifying the agent or recompiling. Hooks run in-process (for the built-in PermissionsHook and audit hooks) or via an external HookRouter (for operator-configured handlers).
Event taxonomy
Caliban fires events at the following lifecycle points (ADR 0024):
| Event | When it fires |
|---|---|
SessionStart | Once at startup, before the first turn |
SessionEnd | On clean exit |
UserPromptSubmit | Before each user message is sent (including slash commands; payload includes is_slash) |
PreCompact | Before context compaction begins |
PostCompact | After compaction completes |
PreToolUse | Before each tool call; can gate or rewrite the call |
PostToolUse | After each tool call completes |
PostToolUseFailure | When a tool call errors |
ConfigChange | When a settings file changes on disk (live reload) |
CwdChanged | When the working directory changes |
FileChanged | When a file the agent edited is detected to have changed |
SubagentStart / SubagentStop | When a sub-agent is spawned or exits |
TaskCreated / TaskCompleted | When a sub-agent task is enqueued or finishes |
PermissionRequest | When the agent requests permission for a tool call |
PermissionDenied | When a tool call is denied |
Notification | General notification events |
Stop / StopFailure | When the agent loop stops (cleanly or with error) |
Additional events (Setup, UserPromptExpansion, PostToolBatch, InstructionsLoaded, WorktreeCreate, WorktreeRemove, Elicitation, ElicitationResult, TeammateIdle) are reserved but not yet fired.
Handler types
Each hook entry declares one or more handlers. Two handler types are fully wired; three are stubs (see below).
| Type | Status | Description |
|---|---|---|
command | Fully wired | Spawn a child process; stdin is event JSON; decision via stdout or exit code |
http | Fully wired | POST event JSON to a URL; decision via response JSON |
mcp | Experimental stub | Invoke an MCP server tool with the event JSON |
prompt | Experimental stub | Call the model router with a classifier prompt |
agent | Experimental stub | Delegate to a sub-agent (async only) |
The mcp, prompt, and agent handler types are defined in the config schema and appear in /hooks output, but their dispatch logic is not yet wired. They will be activated as their upstream dependencies (ADR 0023 MCP wiring, ADR 0037 sub-agent fleet) land. Until then, any handler of these types is silently skipped at dispatch time.
Decision protocol
For PreToolUse and UserPromptSubmit, command and http handlers report their decision as:
Stdout JSON (preferred):
{
"hookSpecificOutput": {
"permissionDecision": "allow",
"permissionDecisionReason": "matched allowlist",
"updatedInput": {}
}
}
permissionDecision values: allow, deny, ask. updatedInput lets the hook rewrite the tool input before dispatch (the rewritten input is validated against the tool's schema; validation failure is a hard deny).
Exit codes (shell-script shorthand):
0— Allow2— Deny (stderr becomes the reason)- anything else — Allow with a logged warning
PostToolUse and observer-only hooks ignore the decision even when a handler provides one. Handlers marked async = true are fire-and-forget; their decisions are always ignored.
Config: settings hooks table (preferred)
Hooks live in the unified settings file under the hooks key. The table maps event names to arrays of handler groups. See Settings Layering for how scopes merge — hook arrays concatenate across scopes (project entries append to user entries).
# .caliban/settings.toml — project scope
disable_all_hooks = false
allow_managed_hooks_only = false
allowed_http_hook_urls = [
"https://hooks.example.com/*",
]
http_hook_allowed_env_vars = ["AUDIT_TOKEN"]
[[hooks.SessionStart]]
matcher = "*"
[[hooks.SessionStart.handlers]]
type = "command"
command = "/usr/local/bin/caliban-audit"
args = ["session-start"]
timeout = "5s"
[[hooks.PreToolUse]]
matcher = "Bash"
if = "Bash:rm *"
[[hooks.PreToolUse.handlers]]
type = "command"
command = "${CALIBAN_PROJECT_DIR}/.caliban/hooks/guard-rm.sh"
async = false
[[hooks.PreToolUse]]
matcher = "WebFetch"
[[hooks.PreToolUse.handlers]]
type = "http"
url = "https://hooks.example.com/preflight"
headers = { Authorization = "Bearer ${AUDIT_TOKEN}" }
timeout = "3s"
[[hooks.PostToolUse]]
matcher = "*"
[[hooks.PostToolUse.handlers]]
type = "mcp"
mcp = "audit-server"
tool = "log_tool_call"
async = true
Config: legacy hooks.toml (compat)
If no hooks key appears in any settings file, caliban falls back to loading:
<workspace>/.caliban/hooks.toml(project scope)~/.config/caliban/hooks.toml(user scope)
The legacy file uses the same TOML shape shown above (top-level keys plus [[hooks.<Event>]] arrays). The two scopes merge with project entries taking priority. This path is deprecated — prefer the unified settings file for new configurations.
Safety controls
| Setting / flag | Effect |
|---|---|
disable_all_hooks = true | Bypasses all external handlers; in-process hooks (permissions, audit) still run |
allow_managed_hooks_only = true | Only handlers from the managed settings scope fire |
allowed_http_hook_urls | URL glob allowlist; HTTP handlers fail closed if the URL isn't listed |
http_hook_allowed_env_vars | Env vars that may be expanded in HTTP handler headers |
--no-hooks | One-off CLI override; mirrors disable_all_hooks for a single run |
CALIBAN_NO_HOOKS=1 | Same, via environment variable |
Mark your audit hooks async = true. Async handlers observe the event but their decision is discarded, so they can never accidentally block a tool call. They run on a bounded task pool (default 16 concurrent) so they don't pile up under heavy load.
Related pages
- Permissions concepts
- Plugins — plugins can bundle hook configurations
- Slash Command Index —
/hooksshows the active handler set
MCP Servers
Caliban implements the Model Context Protocol client side, letting you connect any MCP-compatible server as a source of additional tools. Connected servers' tools appear in the same ToolRegistry as built-ins, with the naming convention mcp__<server>__<tool>.
Configuring servers
Servers are declared in the mcp_servers table of your unified settings file, or in the legacy mcp.toml when no unified settings are present.
Minimal stdio server
# .caliban/settings.toml
[mcp_servers.linear]
command = "npx"
args = ["-y", "@linear/mcp-server"]
env = { LINEAR_API_KEY = "${LINEAR_API_KEY}" }
HTTP server
[mcp_servers.notion]
type = "http"
url = "https://mcp.notion.com/v1"
headers = { Authorization = "Bearer ${NOTION_TOKEN}" }
SSE server
[mcp_servers.legacy-api]
type = "sse"
url = "https://api.example.com/mcp/sse"
Server configuration reference
| Field | Applies to | Description |
|---|---|---|
type / transport | all | "stdio" (default), "http", "sse" |
command | stdio | Executable to spawn |
args | stdio | CLI arguments |
env | stdio | Environment variables; ${VAR} and ${VAR:-default} expanded |
cwd | stdio | Working directory; relative paths resolve from caliban's cwd |
url | http, sse | Absolute http:// or https:// URL |
headers | http, sse | Static request headers; values support ${VAR} expansion |
oauth | http, sse | OAuth mode: "off" (default), "auto", "manual" |
disabled | all | true to skip this server entirely |
permissions | all | Per-server permission scoping (see below) |
${CLAUDE_PROJECT_DIR} expands to the current workspace root in all string fields, so plugin-bundled servers can reference binaries relative to the workspace without hardcoding paths.
OAuth (oauth = "auto" and "manual")
For HTTP/SSE servers behind OAuth, caliban performs the authorization-code flow with PKCE and a loopback callback server.
Auto discovery (oauth = "auto"): caliban discovers endpoints from the server's /.well-known/oauth-protected-resource and /.well-known/oauth-authorization-server documents.
Manual configuration (oauth = "manual"): provide a [mcp_servers.<name>.oauth_config] block:
[mcp_servers.my-server]
type = "http"
url = "https://api.example.com/mcp"
oauth = "manual"
[mcp_servers.my-server.oauth_config]
client_id = "${MY_CLIENT_ID}"
auth_url = "https://auth.example.com/authorize"
token_url = "https://auth.example.com/token"
scopes = ["read", "write"]
Tokens are stored in the OS keyring; caliban falls back to $XDG_DATA_HOME/caliban/mcp-tokens.json (mode 0600) on systems without keychain support.
Use --mcp-oauth-port <PORT> (or CALIBAN_MCP_OAUTH_PORT) to fix the loopback callback port on firewalled machines instead of letting caliban pick an ephemeral one.
Per-server permissions
Each server can declare scoped permission rules that compose with the global rule grammar. Patterns match the unprefixed tool name; caliban expands them to mcp__<server>__<tool> when evaluating against the global engine.
[mcp_servers.linear.permissions]
allow = ["read_*", "list_*"]
deny = ["delete_*"]
ask = ["create_*", "update_*"]
Merge order when multiple rules match a call:
global deny → server deny → server ask → server allow → global ask → global allow → default (Ask)
Discovery and the /mcp overlay
At startup, caliban connects to every non-disabled server, sends initialize, and registers one McpTool per advertised tool. Failures (spawn error, handshake timeout) are logged at warn and skipped — they do not abort startup.
The /mcp slash command shows per-server status:
| Glyph | Meaning |
|---|---|
● | Connected |
◐ | Connecting / partial |
○ | Disabled or failed |
@server:resource references
Type @<server>: in the input bar to trigger resource autocomplete for that server. Caliban calls resources/list lazily on first use and caches the result; resources/list_changed notifications invalidate the cache.
Elicitation
When an MCP server needs additional input from the user (for example, before a destructive operation), it sends an elicitation request. In interactive mode, caliban shows a TUI modal. In --print / CI mode, elicitation requests are automatically declined.
Controls
| Flag / env | Effect |
|---|---|
--no-mcp | Skip all MCP server discovery and registration |
CALIBAN_NO_MCP=1 | Same, via environment variable |
--mcp-oauth-port <PORT> | Fix the loopback OAuth callback port |
CALIBAN_MCP_OAUTH_PORT=<PORT> | Same, via environment variable |
The preferred location for MCP server config is the mcp_servers table in .caliban/settings.toml (project scope) or ~/.config/caliban/settings.toml (user scope). The legacy mcp.toml is still supported as a fallback when no unified settings file is present — project overrides user at the same server name, wholesale.
Related pages
- Plugins — plugins can bundle MCP server configs
- Permissions concepts
- Slash Command Index —
/mcpoverlay
Plugins
A plugin bundles related customizations — skills, hooks, sub-agent definitions, MCP server configs, and output styles — into a single installable directory. The plugin system (ADR 0030) is a thin orchestrator: it parses a plugin.json manifest, namespaces items, expands ${CALIBAN_PLUGIN_ROOT}, and feeds everything into the same per-surface loaders that project and user files use.
What a plugin contains
my-plugin/
plugin.json # required manifest
skills/
my-workflow/
SKILL.md
hooks/
hooks.json
agents/
reviewer.md
output-styles/
concise.md
mcp/
.mcp.json
commands/
recap.md
All subdirectories are optional. When a components entry is omitted from the manifest, the loader scans the conventional subdirectory automatically.
The manifest (plugin.json)
{
"name": "my-plugin",
"version": "1.0.0",
"description": "Short description shown in /plugins and trust prompts",
"author": "Alice <alice@example.com>",
"license": "MIT",
"homepage": "https://example.com/my-plugin",
"components": {
"skills": ["skills/my-workflow"],
"hooks": "hooks/hooks.json",
"agents": ["agents/reviewer.md"],
"output_styles": "output-styles/concise.md",
"mcp_servers": "mcp/.mcp.json",
"commands": ["commands/recap.md"]
},
"caliban": {
"min_version": "0.5.0",
"platforms": ["macos", "linux"]
}
}
| Field | Required | Description |
|---|---|---|
name | Yes | Matches the directory name. Must be [a-z0-9_-]{1,32}. |
version | Yes | Semver string. |
description | No | One-line description. |
author | No | Free-form author string. |
license | No | SPDX identifier. |
homepage | No | URL. |
components | No | Paths to bundled files (string or array). |
caliban.min_version | No | Skip when the running caliban is older. |
caliban.platforms | No | Limit to macos, linux, or windows. |
For MCP servers bundled as inline config (matching Claude Code's .mcp.json shape), use the top-level mcpServers key instead of components.mcp_servers.
Discovery roots
Caliban scans three roots at startup. A plugin with the same name in an earlier root replaces later ones — no manifest merging.
| Priority | Root | Scope |
|---|---|---|
| 1 (highest) | <workspace>/.caliban/plugins/<name>/ | Project |
| 2 | $XDG_DATA_HOME/caliban/plugins/<name>/ (user install dir) | User |
| 3 | /etc/caliban/plugins/<name>/ (platform analogues) | Managed (org policy) |
Managed plugins ignore the plugins.enabled list — they run regardless of per-user configuration.
Namespacing
Items loaded from a plugin carry a <plugin>:<item> prefix:
- Skills:
my-plugin:my-workflow - Output styles:
my-plugin:concise - MCP servers:
my-plugin:my-server
This prevents collisions with bare-named items at the project or user level. Hooks merge additively across plugins.
The caliban plugin command
# List all installed plugins and their status
caliban plugin list
# Show the manifest of an installed plugin as JSON
caliban plugin info <name>
# Install a plugin from a marketplace
caliban plugin install <name>@<marketplace-url>
# Install a plugin from a local directory
caliban plugin install --dir /path/to/my-plugin
# Update a plugin to the latest marketplace version
caliban plugin update <name>
# Remove a plugin
caliban plugin remove <name>
# Enable / disable a plugin (affects whether it loads at startup)
caliban plugin enable <name>
caliban plugin disable <name>
caliban plugin help prints the full reference.
Marketplace trust
First-time marketplace installs display the manifest, its sha256 hash, and the install URL and prompt for acknowledgement. Acknowledged installs are recorded in $XDG_DATA_HOME/caliban/trust/plugins.json. Re-installs of the same manifest hash skip the prompt; version bumps re-prompt. Sideloads (local --dir installs) skip trust gating because the operator already has filesystem access.
${CALIBAN_PLUGIN_ROOT} expansion
Inside plugin-bundled hook commands and MCP server configs, ${CALIBAN_PLUGIN_ROOT} expands to the plugin's absolute root directory. ${CLAUDE_PLUGIN_ROOT} is a supported alias so existing Claude Code plugins port verbatim.
The --no-plugins flag (or CALIBAN_NO_PLUGINS=1) disables plugin discovery entirely for a single run, treating all plugin roots as empty. This is useful for debugging or for CI environments that should not pick up locally installed plugins.
Related pages
- Skills
- Hooks
- MCP Servers
- Output Styles
- Slash Command Index —
/pluginsoverlay
Output Styles
Output styles nudge the model toward a particular response shape — more explanatory prose, learning-paced prompts with TODO(human) markers, or a proactive fill-in approach — by splicing a block into the system prompt. They are orthogonal to tools, hooks, and permissions: switching styles changes only the system prompt.
Built-in styles
Caliban ships four built-in styles compiled into the binary:
| Name | Description |
|---|---|
default | No-op — identical to having no style configured (zero prompt-cache impact) |
proactive | Encourages the model to fill in gaps and make decisions rather than pausing to ask |
explanatory | Requests detailed commentary explaining each decision and code change |
learning | Instructs the model to emit TODO(human): <prompt> markers on non-trivial decisions; the TUI highlights them |
The default style emits no block at all, so switching to it produces the exact same system prompt as having no style — prompt-cache hits are preserved.
Selecting the active style
Via settings (preferred): set output_style in your settings file.
# ~/.config/caliban/settings.toml
output_style = "explanatory"
Via environment variable (until the settings hierarchy is fully wired):
CALIBAN_OUTPUT_STYLE=learning caliban
Via the TUI: use /output-style to open the picker. The new selection is remembered for the session but takes effect only after /clear or a restart, because providers cache the system prompt and a mid-session change would silently invalidate that cache.
System prompts are cached by every major provider. Selecting a new style mid-session does not change what the provider sees until the next session begins. The /config output-style overlay surfaces a "applies after /clear or restart" hint.
How styles splice into the system prompt
OutputStylePrefix::splice_into wraps the active style's body in an <output-style> XML element and prepends it to the base system prompt. Memory tier content goes first, then the style block, then the base body:
[memory tiers]
<output-style name="explanatory">
... style body ...
</output-style>
[base system prompt]
If the active style has an empty body (the default style), splice_into returns the base prompt unchanged — no extra tokens, no cache miss.
The frontmatter field keep_coding_instructions: false (default true) lets a style suppress the default coding-assistant guidance block. Use this for documentation-only or writing-only modes where coding instructions are irrelevant.
Custom styles
Drop a .md file with YAML frontmatter into the appropriate directory. The file stem must match the name: field.
.caliban/output-styles/
brief.md
Example brief.md:
---
name: brief
description: "Terse responses — one sentence per point, no preamble."
keep_coding_instructions: true
---
Keep all responses as brief as possible. One sentence per point.
No greetings, no summaries, no padding. Respond with the minimum necessary.
Required fields: name and description. Both snake_case (keep_coding_instructions) and kebab-case (keep-coding-instructions) are accepted.
Discovery roots
| Priority | Location | Scope |
|---|---|---|
| 1 (highest) | <workspace>/.caliban/output-styles/<name>.md | Project |
| 2 | $XDG_CONFIG_HOME/caliban/output-styles/<name>.md | User |
| 3 | $XDG_DATA_HOME/caliban/plugins/<plugin>/output-styles/<name>.md | Plugin (namespaced <plugin>:<name>) |
| 4 (lowest) | Built-ins (compiled in) | Built-in |
A project style with the same name shadows user, plugin, and built-in styles.
Plugin-supplied styles and force_for_plugin
A plugin-supplied style with force_for_plugin: true in its frontmatter overrides the operator's output_style setting while the plugin is enabled. The /config picker shows a "locked by plugin: X" badge. Disabling the plugin releases the lock.
force_for_plugin: true is silently ignored on non-plugin styles (project, user, built-in).
Related pages
- Plugins
- Settings Reference —
output_stylekey - Slash Command Index —
/output-stylepicker
Sub-agents
Caliban can spawn a nested agent — a sub-agent — to handle a focused subtask without polluting the parent's transcript. The parent's turn loop pauses while the sub-agent runs, then resumes with the sub-agent's condensed result as a single tool-result block.
The AgentTool
Sub-agents are exposed to the model as a built-in tool named AgentTool.
When the model invokes it, caliban spins up a fresh Agent instance in the
same process and drives it to completion.
Key properties of an AgentTool invocation:
| Property | Value |
|---|---|
| Process boundary | None — in-process, same tokio runtime |
| Max turns | 20 (hard limit) |
| Output returned to parent | Final assistant text, truncated to 5 000 chars |
| Intermediate turns | Not recorded in the parent session; visible in debug logs |
| Cancellation | Inherits the parent's cancellation token |
| Provider / model | Inherits parent's provider; model input overrides the model |
| Hooks | Inherited by default; opt out with inherit_hooks: false |
Tool allowlist
The tool_allowlist input controls which tools the sub-agent may call:
- Omitted or
null— inherits every tool the parent has, exceptAgentToolitself. - Explicit list — sub-agent gets exactly those tools. Unknown names are silently dropped.
AgentTool is always stripped from the sub-agent's registry. Sub-agents cannot spawn further sub-agents. Nested fan-out is planned for a future release.
Isolation mode
Each AgentTool invocation carries an isolation field (none or
worktree):
none(default) — sub-agent shares the parent's working directory. Suitable for read-only work (investigation, summarization).worktree— sub-agent runs in a dedicated git worktree materialized at.caliban/worktrees/<name>. Suitable for tasks that write files. See Worktree Isolation for details.
Background mode
Setting background: true in the AgentTool input detaches the sub-agent
from the parent and hands it off to the caliband supervisor daemon. The
parent's call returns immediately with the new agent's id. See
The Background Fleet.
Closure-based hooks cannot cross the process boundary. When
background: true is set and the parent has closure hooks installed,
caliban drops those hooks with a warning and continues. Only
config-expressible hooks survive the handoff. Pass inherit_hooks: false
to suppress the warning if you know the sub-agent does not need the parent's
hooks.
The --no-sub-agent flag
Pass --no-sub-agent (or set CALIBAN_NO_SUB_AGENT=1) to remove
AgentTool from the tool registry entirely. The model will never see the
tool and cannot spawn sub-agents.
caliban --no-sub-agent "review this codebase"
This is useful when you want a strict single-agent session, or when operating in an environment where spawning child work is undesirable (CI cost budgets, audit requirements).
When to use sub-agents
| Use case | Recommended approach |
|---|---|
| Read-only research (grep, read, glob) without context bloat | AgentTool with tool_allowlist: ["Read","Grep","Glob"] |
| File-writing subtask that must not mix diffs | AgentTool with isolation: worktree |
| Long-running task that should survive the parent session | AgentTool with background: true, or --bg <task> |
| Strict single-agent run | --no-sub-agent |
For the full set of built-in tools the sub-agent can draw on, see Built-in Tools.
The Background Fleet
Caliban can run sub-agents in the background — detached from your current
session — and let you monitor, attach to, or stop them at will. A per-repo
supervisor daemon (caliband) owns the fleet and keeps agents alive even
after the parent caliban process exits.
Spawning a background agent
From the command line
The quickest way to fire off a background task is the --bg flag:
caliban --bg "refactor the auth module to use the new token type"
This is shorthand for caliban agents spawn --prompt <task>. Caliban
auto-starts caliband if it is not already running, then returns immediately
with the new agent's id.
From inside a session
The model can request a background sub-agent by setting background: true
in an AgentTool call. The parent session receives the id and a note to
check back via caliban attach <id>.
The caliband daemon
caliband is a separate binary shipped alongside caliban. It runs as a
per-repo daemon, meaning each git repository gets its own daemon instance.
Socket path (resolution order):
$CALIBAN_DAEMON_RUNTIME_DIR/<hash>.sockifCALIBAN_DAEMON_RUNTIME_DIRis set.$XDG_RUNTIME_DIR/caliban/<hash>.sockif$XDG_RUNTIME_DIRis set.$TMPDIR/caliban-daemon/<hash>.sock(fallback; typical on macOS).
The <hash> is a 16-hex-char SHA-256 prefix of the absolute repo root path,
so each repo gets a stable, unique socket without naming collisions.
caliband auto-starts when any caliban agents command or --bg flag
needs it. You should rarely need to launch it directly.
cargo install caliban installs only the caliban binary.
To also install the daemon run:
cargo install caliban-supervisor --bin caliband
Both binaries must be on your $PATH for background fleet features to work.
Agent lifecycle states
| State | Meaning |
|---|---|
spawning | Registered, not yet executing |
running | Actively processing turns |
idle | Waiting for input; no compute pending |
killed | Stopped via kill |
done | Finished successfully |
failed | Finished with an error |
crashed | Daemon restarted while agent was active; needs recovery |
caliban agents subcommands
caliban agents list
Print all registered agents and their status.
caliban agents list
caliban agents spawn
Spawn a new background agent with an explicit prompt.
caliban agents spawn --prompt "audit all SQL queries for injection risks"
caliban agents spawn --prompt "write tests for crates/caliban-tools-builtin" --label my-test-agent
Options:
| Flag | Description |
|---|---|
--prompt <TEXT> | Initial prompt for the agent (required) |
--label <NAME> | Human-readable label shown in list and logs |
caliban agents attach <id>
Stream a running agent's transcript live. Press Ctrl+D to detach without
stopping the agent.
caliban agents attach a3f8b2c1
caliban agents logs <id>
Print the agent's session log (session.json).
caliban agents logs a3f8b2c1
caliban agents kill <id>
Terminate an agent (SIGTERM, escalating to SIGKILL after a grace period).
caliban agents kill a3f8b2c1
caliban agents respawn <id>
Kill the agent and restart it with the same original spawn spec (same prompt, model, isolation settings).
caliban agents respawn a3f8b2c1
Note that respawn assigns a new id; the old id is removed from the
registry.
caliban agents rm <id>
Remove an agent from the registry. The agent must be stopped first, unless
--force is passed.
caliban agents rm a3f8b2c1
caliban agents rm a3f8b2c1 --force # remove even if still running
Top-level shorthands
Four common operations have top-level sugar to save typing:
| Shorthand | Equivalent |
|---|---|
caliban attach <id> | caliban agents attach <id> |
caliban logs <id> | caliban agents logs <id> |
caliban stop <id> | caliban agents kill <id> |
caliban kill <id> | caliban agents kill <id> |
caliban respawn <id> | caliban agents respawn <id> |
caliban rm <id> | caliban agents rm <id> |
caliban daemon subcommands
caliban daemon status
Print daemon health, PID, uptime, agent count, and the socket path.
caliban daemon status
caliban daemon stop
Ask the daemon to shut down gracefully after finishing in-flight requests. Running agents are not automatically killed; stop them first if you want a clean shutdown.
caliban daemon stop
Session storage
Each background agent's transcript is stored as a regular caliban session
file at <base>/agents/<id>/session.json. This means all session tooling
(compaction, replay, audit) works on background agents out of the box.
Attaching to an agent is conceptually the same as resuming its session over
the agent's per-agent socket.
Diagram: agent lifecycle
flowchart LR
A([caliban --bg task]) -->|spawn request| D[caliband daemon]
D -->|registers| R[(Registry)]
D -->|starts| W[Agent worker]
W -->|streams turns| S[(session.json)]
W -->|per-agent socket| T([caliban attach id])
W -->|done/failed| R
T2([caliban agents kill id]) -->|kill request| D
D -->|SIGTERM→SIGKILL| W
For how background agents use git worktree isolation, see Worktree Isolation.
Worktree Isolation
When a sub-agent writes files, those writes land in the parent's working tree by default. That is fine for read-only investigation, but it mixes the sub-agent's diff into yours and gives you no clean way to discard it. Worktree isolation solves this: caliban materializes a dedicated git worktree for the sub-agent so its file operations are completely separate from the parent's tree.
How it works
When isolation: worktree is requested, caliban uses the
caliban-worktrees crate to:
- Create a new git branch named
caliban/<name>off the chosen base ref. - Materialize a worktree at
.caliban/worktrees/<name>/in the repo root. - Optionally apply sparse-checkout patterns to limit which paths are checked out.
- Optionally symlink heavy directories (e.g.
target/,node_modules/) from the parent repo into the worktree so they are shared rather than duplicated. - Run the sub-agent with its working directory set to the worktree root.
The sub-agent's git history (commits, diffs) lives on the caliban/<name>
branch. You can inspect, cherry-pick, or discard it with standard git
commands after the run.
Base ref options
The worktree.base_ref field controls what the new branch is rooted on:
| Value | Effect |
|---|---|
head (default) | Branch off the current HEAD commit |
fresh | Branch off HEAD, but start with a near-empty sparse checkout (only a sentinel pattern is checked out) |
| Any rev-parse-able string | Branch off that specific commit, tag, or branch name |
Sparse checkout
Set worktree.sparse_paths to a list of path patterns to limit which files
are materialized in the worktree. Patterns follow git's sparse-checkout cone
format. An empty list (the default) checks out all files.
{
"prompt": "refactor crates/caliban-tools-builtin",
"isolation": "worktree",
"worktree": {
"base_ref": "head",
"sparse_paths": ["crates/caliban-tools-builtin/", "Cargo.toml"]
}
}
Symlinked directories
Large directories that should be shared — not copied — go in
worktree.symlink_directories. Each path is relative to the parent repo
root. The directory must exist in the parent at creation time.
{
"prompt": "run the test suite and summarize failures",
"isolation": "worktree",
"worktree": {
"symlink_directories": ["target", "node_modules"]
}
}
Worktree symlink support on Windows requires Developer Mode or elevated
privileges. On Windows, symlink_directories is best-effort and may fall
back to copying on systems where symlinks are restricted.
Cleanup behavior
| Context | When the worktree is removed |
|---|---|
| Foreground sub-agent | When the sub-agent's task completes (the handle drops) |
| Background sub-agent | When caliban agents rm <id> is run |
| Daemon restart with orphans | On next daemon startup (configurable) |
Set CALIBAN_KEEP_WORKTREES=1 to disable automatic removal for debugging.
The worktree (and its caliban/<name> branch) will then persist until you
remove it manually with git worktree remove and git branch -d.
Operator notes
- Disk usage. Each worktree is a full checkout of the matched paths.
Use
sparse_pathsandsymlink_directoriesto keep sizes manageable. The defaultheadbase ref shares git objects with the parent repo, so only working-tree files consume extra disk. - One worktree per sub-agent. Two concurrent sub-agents with the same
namewill conflict. Background fleet agents receive auto-generated names based on their id, so fleet-level collisions are not a concern. For foreground parallel agents (a future feature), use distinct names. - Branch visibility.
git branch --list 'caliban/*'shows all active sub-agent branches. You can merge, rebase, or delete them like any other branch.
For how worktree isolation relates to background agents and the caliband
daemon, see The Background Fleet.
Memory Tiers
Caliban carries three on-disk memory tiers that are spliced into every system prompt before the session starts. All three are plain Markdown files you can read and edit with any text editor. A fourth tier — MCP-mediated long-form memory — is planned for a future release.
flowchart LR
G["Global CLAUDE.md\n~/.config/caliban/CLAUDE.md"]
P["Project tier\n<workspace>/CLAUDE.md\n(+ ancestor walk + @-imports + rules)"]
A["Auto-memory index\n~/.local/share/caliban/projects/<slug>/memory/MEMORY.md"]
SP["System prompt"]
G --> SP
P --> SP
A --> SP
The splice order is always global → project → auto-memory, each tier wrapped in an XML-tagged block so the model can distinguish them:
<global-claude-md path="…/CLAUDE.md">
…
</global-claude-md>
<project-claude-md path="…/CLAUDE.md">
…
</project-claude-md>
<auto-memory-index path="…/MEMORY.md">
…
</auto-memory-index>
<default system prompt…>
Missing tiers are silently omitted — no empty tag block is emitted.
Tier 1 — Global
Path: ~/.config/caliban/CLAUDE.md (XDG $XDG_CONFIG_HOME honored)
Owned by the operator. Caliban never writes here. Use it for cross-project preferences: tool choices, tone, coding style, personas. Read once at startup; missing file is fine.
Tier 2 — Project
Path: <workspace_root>/CLAUDE.md — plus the ancestor walk described in
CLAUDE.md & Imports.
Owned by the project / repository — commit it like any other file. Contains repo-specific conventions, build commands, and taboos. Caliban never writes here.
Tier 3 — Auto-memory
Directory: ~/.local/share/caliban/projects/<sanitized-cwd>/memory/
(XDG $XDG_DATA_HOME honored; override with CALIBAN_AUTO_MEMORY_DIRECTORY
or CALIBAN_MEMORY_DIR).
Owned by the agent. The agent uses ReadMemoryTopic and WriteMemoryTopic
— two built-in tools — to maintain a per-project knowledge base across
sessions. See Auto-Memory for the full format and write
protocol.
Only MEMORY.md (the index, capped at 200 lines / 25 KB) is loaded eagerly
each session. Topic files are read on demand.
Token budget
The combined memory prefix defaults to 32 000 tokens (estimated as
bytes / 4, provider-agnostic). If the combined size exceeds the cap,
the auto-memory tier is truncated first (a [truncated: N bytes] notice is
appended to its block), then the project tier, then the global tier.
Per-tier caps can be set in the [memory] block of settings.toml:
[memory]
cap_tokens_auto = 8000 # cap the auto tier independently
cap_tokens_claude_md = 16000 # cap the combined CLAUDE.md tier
cap_tokens_combined = 28000 # override the combined ceiling
The same values can be set via environment variables:
CALIBAN_MEMORY_BUDGET_TOKENS, CALIBAN_MEMORY_CAP_TOKENS_AUTO, and
CALIBAN_MEMORY_CAP_TOKENS_CLAUDE_MD.
When the sum of both per-tier caps would exceed the combined ceiling, each is scaled down proportionally so the sum fits.
The Memory tool and /memory
The built-in Memory tool is the agent-facing interface for reading and writing
the auto-memory tier. See Built-in Tools for the full
tool reference.
The /memory slash command shows the active tiers, their paths, and their
estimated token counts:
/memory
global ~/.config/caliban/CLAUDE.md (412 tokens)
project /Users/me/dev/myproject/CLAUDE.md (880 tokens)
auto ~/.local/share/caliban/projects/…/memory/MEMORY.md (256 tokens)
walk /Users/me/dev/myproject/CLAUDE.md (880 tokens)
Set CALIBAN_DISABLE_AUTO_MEMORY=1 to drop the auto-memory tier entirely and
prevent the auto-memory skill from loading. This guarantees identical system
prompts across headless and CI runs regardless of on-disk memory state.
--bare sets the same flag automatically.
CLAUDE.md & Imports
The project memory tier is richer than a single file. At session start, caliban
walks up the directory tree from the current working directory, concatenating
every CLAUDE.md, AGENTS.md, and .caliban.md it finds, then resolves any
@-imports inside them and activates path-scoped rules from .caliban/rules/.
Ancestor walk
Starting at cwd, caliban walks toward the filesystem root. The walk stops at
the first git root it finds, or the filesystem root, whichever comes first
(WalkStop::Both, the default).
Within each directory, files are loaded in most-specific → most-general
order: .caliban.md → CLAUDE.md → AGENTS.md. All three are concatenated;
they don't override each other.
The resulting files are spliced in broad → narrow order (root-first) so that narrower, more-specific instructions appear later and take precedence in the model's reading.
If the ancestor walk misbehaves in a repo you don't have CI coverage for, set
CALIBAN_DISABLE_CLAUDE_MD_WALK=1 to revert to the legacy single-file project
tier (<workspace_root>/CLAUDE.md only).
@-imports
Any of the discovered files may contain @-import directives on their own
line:
@./shared/conventions.md
@~/notes/api-style.md
@/abs/path/to/team-guide.md
Import resolution is:
- Depth-bounded to 5 levels of recursion.
- Cycle-detected by canonical path — circular imports are ignored.
- Local paths only. HTTP/HTTPS URLs (
@https://…) are rejected outright to keep the prompt-assembly path auditable. - External imports (paths outside the workspace root and outside
~/.config/caliban/) require one-time approval. The approval decision is persisted to~/.caliban/imports-allowlist.json. In non-interactive mode (--print,--bare, CI), external imports are denied unlessCALIBAN_APPROVE_IMPORTS=1is set.
Imported content is inlined at the import site with an
<!-- imported from … --> marker so the model can trace provenance.
Nested on-demand
When the model reads or edits a file in a subdirectory that has its own
CLAUDE.md, that file is appended to the system prompt for the rest of the
session. This happens once per (path, session) pair — caliban does not
reload on file changes or unload when the model leaves the subtree.
The system prompt grows monotonically during a session. This is intentional: operators reason about it as "everything the model has been told", not as a sliding window.
Path-scoped rules
Files under .caliban/rules/<topic>.md are loaded with optional
paths: glob frontmatter:
---
paths:
- "src/**/*.ts"
- "tests/**/*.ts"
---
Always use `strict` TypeScript. Prefer `unknown` over `any`.
Rules without a paths: frontmatter are always-active and loaded at startup.
Rules with paths: frontmatter are activated lazily on the first file touch
matching any pattern in the set. Once activated, a rule stays in the prompt for
the rest of the session.
claude_md_excludes for monorepos
Large monorepos often have directories whose CLAUDE.md should not be spliced
into every session. Add gitignore-style patterns to settings.toml to skip
them during the ancestor walk:
claude_md_excludes = [
"node_modules/**",
"vendor/**",
"third_party/**/CLAUDE.md",
]
Patterns are evaluated relative to the workspace root (the cwd at startup),
not the absolute filesystem path. Last-match wins for a given path; !
negation is supported.
The same patterns can be supplied at runtime via the colon- or
newline-separated CALIBAN_CLAUDE_MD_EXCLUDES environment variable.
Additional directories
--additional-directories (or additional_directories in settings.toml)
extends the set of paths the file tools can access. These directories do not
contribute CLAUDE.md content by default. Set
CALIBAN_ADDITIONAL_DIRECTORIES_CLAUDE_MD=1 to opt in — each added path then
performs its own ancestor walk, concatenated after the cwd walk in declaration
order.
Tier content is spliced via the <project-claude-md> and <project-rule> XML
tags in the system prompt. Use /memory to inspect which files were loaded and
their token counts.
Auto-Memory
Auto-memory is the agent-writable third tier of caliban's memory model. At the
start of each session, caliban splices a per-project index file (MEMORY.md)
into the system prompt. During the session, the agent uses two built-in tools —
ReadMemoryTopic and WriteMemoryTopic — to read and write Markdown topic
files that persist knowledge across sessions.
Directory layout
~/.local/share/caliban/projects/<sanitized-cwd>/memory/
MEMORY.md ← index, spliced into every session (≤ 200 lines / 25 KB)
build-commands.md ← topic file
api-conventions.md ← topic file
deploy-checklist.md ← topic file
…
The <sanitized-cwd> slug is derived from the canonical workspace path (e.g.
/Users/jf/dev/caliban → Users-jf-dev-caliban). Override the directory with
CALIBAN_AUTO_MEMORY_DIRECTORY or CALIBAN_MEMORY_DIR.
The index file (MEMORY.md)
MEMORY.md is the only file loaded eagerly each session. It must stay under
200 lines / 25 KB so it fits comfortably inside the splice budget. Caliban
bootstraps an empty MEMORY.md with a conventions block the first time the
memory directory is accessed.
A typical index looks like:
# Memory index
- [build-commands](build-commands.md) — project: `cargo build --release`; binary lands in `target/release/`
- [api-conventions](api-conventions.md) — feedback: prefer the built-in HTTP helper over shelling out to curl
- [deploy-checklist](deploy-checklist.md) — project: run migrations before flipping the feature flag
HTML comments (<!-- … -->) in MEMORY.md are stripped from the spliced
prompt but kept on disk. The auto-injected conventions block is wrapped in HTML
comments for this reason — it stays on disk for authoring guidance but does not
consume token budget.
Topic file format
Each topic file is a Markdown file with YAML frontmatter:
---
name: sprint-mode
description: "user prefers consolidated design proposals + spec + plan + implementation in one pass"
metadata:
node_type: memory
type: feedback
---
User prefers a single-pass workflow: design proposal, spec, plan, and
implementation delivered together without a human review checkpoint in
between.
| Frontmatter field | Required | Description |
|---|---|---|
name | yes | Kebab-case slug matching the filename stem |
description | yes | One-line summary (≤ 120 chars); appears in the index |
metadata.type | yes | One of user, feedback, project, reference |
metadata.node_type | no | Always memory when written by the agent |
Slug rules: non-empty, no path separators, no .., no leading ..
Memory types
| Type | Use for |
|---|---|
user | Durable facts about the user (role, timezone, preferences) |
feedback | Corrections or workflow preferences issued by the user |
project | Durable project facts not already captured in the repo |
reference | Stable external context (account IDs, API endpoints, quotas) |
The agent classifies each topic at write time. There is no automated classifier — the model is best positioned to judge what to save.
Built-in tools
| Tool | Permission category | Description |
|---|---|---|
ReadMemoryTopic | memory.* (allow) | Read a topic file by slug |
WriteMemoryTopic | memory.* (allow) | Write/update a topic file and update the index atomically |
Both tools are sandboxed to the memory directory — path traversal attempts are rejected at the tool level.
WriteMemoryTopic performs an atomic write:
- Write topic body + frontmatter to
<slug>.md.tmp. - Rename to
<slug>.md(atomic on the same filesystem). - Rewrite
MEMORY.mdwith an updated index line for the slug (same tmp-then-rename approach).
A crash between steps 2 and 3 leaves an orphan topic file. Run
/memory rebuild-index to repair it.
Managing memory
| Command | Effect |
|---|---|
/memory | Show active tiers, paths, and token counts |
/memory rm <slug> | Delete a topic file and remove its index line |
/memory rebuild-index | Rebuild MEMORY.md from the topic files on disk |
There is no automatic pruning. Memories persist until manually removed.
The index grows without bound on long-running projects. Periodically review
it with /memory and remove stale topics with /memory rm <slug> to keep it
under the 200-line / 25 KB splice limit.
Cross-references between topics
Topic bodies may contain [[slug]] cross-references, for example
[[parity-gap-matrix]]. These are informational breadcrumbs — caliban does not
auto-follow them. The agent can follow a reference by calling ReadMemoryTopic
with the referenced slug.
Disable for CI
Set CALIBAN_DISABLE_AUTO_MEMORY=1 to drop the auto-memory tier from the
splice and suppress the auto-memory skill. This guarantees identical system
prompts regardless of on-disk memory state. --bare sets the same flag
automatically.
Checkpoints & Rewind
Caliban takes a per-prompt snapshot of every file that a file-writing tool
touched during that prompt's turns. If you don't like the result, /rewind
lets you pick any prior prompt and restore the files, the conversation, or
both — without losing the history of what happened in between.
What gets snapshotted
The checkpoint recorder fires on Write, Edit, MultiEdit, and
NotebookEdit. Before any of these tools mutates a file for the first time
within a prompt, caliban reads the pre-image and stores it content-addressed
under the per-prompt blob directory.
Commands run via Bash (including rm, mv, cp, and arbitrary subprocess
writes) are not captured in the checkpoint. The /rewind overlay surfaces this
in its footer. Bash-created files that a Write/Edit later touches are
recorded from that point forward.
Plan-mode prompts (which reject mutating tools) emit an empty manifest so they are still selectable as conversation-rewind targets.
Disk layout
~/.caliban/projects/<cwd-hash>/checkpoints/<session>/
prompt-001/
manifest.json
blobs/<sha256>.bin
prompt-002/
manifest.json
blobs/<sha256>.bin
…
<cwd-hash> is the first 16 hex characters of sha256(canonical_cwd).
Override the root with CALIBAN_CHECKPOINT_ROOT. Disable recording entirely
with CALIBAN_CHECKPOINT_DISABLED=1.
Each manifest.json records:
| Field | Description |
|---|---|
prompt_index | Monotonic prompt counter within the session (1-based) |
kind | files (normal), plan (plan-mode, no blobs), cleared (pruned) |
title | First ~80 chars of the user message |
created_at | UTC timestamp |
entries | Array of file entries (path, sha256, mode, size, exists_pre, tool) |
partial | true if some blob writes failed |
For each entry, exists_pre: false means the file was created by the prompt
(restore will delete it). Blobs are content-addressed — the same pre-image
across two prompts is stored once.
Triggering /rewind
Open the rewind overlay from the TUI in two ways:
- Type
/rewindat the prompt. - Press Esc Esc (two Esc presses within 400 ms) when the input buffer is empty.
The overlay lists prompts newest-first. Navigate with arrow keys, confirm with Enter.
Restore options
| Option | Default | Effect |
|---|---|---|
| Restore both | Enter | Overwrite tracked files and truncate conversation |
| Restore code only | Overwrite tracked files; leave conversation intact | |
| Restore conversation only | Truncate messages; leave files intact | |
| Summarize from here | Run the compactor on the messages after the checkpoint | |
| Summarize up to here | Run the compactor on the messages up to the checkpoint |
"Truncate conversation" removes all messages after the selected prompt's last assistant message, so the conversation ends at that point in time.
The two summarize options feed the same SummarizingCompactor used by
/compact. They're useful when you want to keep the context clean after
rolling back — for example, summarize everything before the rewind point so the
model retains the overall arc without the failed detour.
Storage limits and pruning
CALIBAN_CHECKPOINT_MAX_BYTES caps total blob storage per project (default
5 GiB). When the cap is exceeded, oldest prompt blobs are dropped first; the
manifest is kept as a cleared marker so the prompt remains selectable for
conversation rewind (but file restore is no longer possible).
A checkpoint directory is removed only when cleanupPeriodDays (default 30)
has elapsed since its last update and the corresponding session is being
pruned by the session store. Checkpoints are never orphaned while a session is
still resumable.
Context & Compaction
Every provider has a finite context window. Caliban tracks utilization in real time and provides several tools — automatic and manual — to keep long sessions healthy without losing important history.
Context tracking
Caliban maintains a ContextWindow counter that accumulates token usage from
every provider response. This is independent of the telemetry subsystem: the
/context command and the TUI status-bar percentage work for all users
regardless of whether CALIBAN_ENABLE_TELEMETRY is set.
/context
input tokens used : 62 430 / 200 000 (31%)
output tokens used: 4 812
⚠ approaching limit (warn threshold: 80%)
/context shows a per-message-kind breakdown and warns when utilization
reaches 80%. See Telemetry & Cost for OTLP
export of context metrics.
Auto-compaction
When the context-window utilization reaches auto_compact_threshold, caliban
automatically runs the configured compactor before the next turn. The default
threshold is 0.75 (75% utilization).
Configure in settings.toml:
auto_compact_threshold = 0.75 # 0.0–1.0; unset or null disables autocompact
Set auto_compact_threshold to null (or omit it) to disable autocompact
entirely and rely on manual /compact invocations.
Micro-compaction
Micro-compaction is an LLM-free per-turn pass that supersedes stale
ToolResult blocks in the conversation history without making any API calls.
The logic is per-tool:
| Tool | Supersession key |
|---|---|
Read | File path |
Grep, Glob | Exact argument string |
WebFetch | URL |
Bash | Never superseded |
When a newer result for the same key exists, the older result block is replaced
with a [superseded: <tool>(<key>)] placeholder, keeping message structure
intact but recovering tokens.
Enable or disable in settings.toml:
micro_compact_enabled = true # default: true
Manual /compact
/compact triggers an immediate compaction of the current conversation through
the configured Compactor (the same path used by autocompact). A
compact.event log entry is emitted and a compact.event metric is recorded if
telemetry is enabled.
/compact
No flags. The compactor strategy (summarizing vs. micro) is determined by the
active configuration — see Hooks for the
PreCompact / PostCompact hook events that fire around each compaction.
/clear
/clear resets the conversation to an empty state and zeroes the
ContextWindow counter. The session file is updated. Use it to start a
fresh sub-task without opening a new session.
/clear
PreCompact and PostCompact hooks
Caliban fires PreCompact before compaction begins and PostCompact after it
completes. These hook events are available to external scripts and MCP handlers.
# In settings.toml [hooks]
[hooks]
PreCompact = [{ type = "command", command = "echo compacting…" }]
PostCompact = [{ type = "command", command = "notify-send 'compact done'" }]
See Hooks for the full hook configuration reference.
Prompt caching
Caliban uses Anthropic-style prompt caching by default to reduce cost on repeated turns. A cache marker is placed on the last user message when its estimated token count meets the minimum threshold.
| Setting / flag | Default | Description |
|---|---|---|
--no-prompt-cache | off | Disable prompt caching for this run |
CALIBAN_NO_PROMPT_CACHE | unset | Same as --no-prompt-cache via environment variable |
min_cache_block_tokens | — | Minimum tokens on the last user message to merit a cache marker |
Configure min_cache_block_tokens in settings.toml:
min_cache_block_tokens = 1024 # omit to use the upstream default
Use --no-prompt-cache during development when you want to measure raw latency
without cache effects, or when debugging unexpected responses that might be
served from a stale cache hit.
Tool result size cap
Caliban can cap the character length of individual tool results before they are
appended to the conversation. This prevents a single large Read or Bash
output from consuming a disproportionate share of the context window.
tool_result_cap_chars = 65536 # 0 disables the cap (default)
Summary of relevant settings
| Setting key | Type | Default | Description |
|---|---|---|---|
auto_compact_threshold | float | 0.75 | Utilization (0–1) that triggers autocompact; null disables |
micro_compact_enabled | bool | true | Enable the LLM-free per-turn supersession pass |
min_cache_block_tokens | integer | — | Minimum tokens to place the prompt cache marker |
tool_result_cap_chars | integer | 0 | Per-result character cap; 0 disables |
Print Mode
Print mode is caliban's non-interactive entry point. Instead of launching the TUI, it drives the agent to completion and writes results to stdout — making caliban scriptable from a shell, a CI job, or any program that can invoke a subprocess.
Activating print mode
| Method | Example |
|---|---|
-p / --print flag | caliban -p "summarize this repo" |
--output-format flag | caliban --output-format json "fix the bug" |
| Auto-headless | caliban detects a piped stdout or non-TTY stdin and enters print mode automatically |
Auto-headless fires when a prompt is given and stdout is piped or stdin is not a TTY. Pass --no-auto-print to suppress this inference and keep the TUI even in piped contexts.
Choosing an output format
--output-format text|json|stream-json
| Format | Output |
|---|---|
text | The assistant's final reply, streamed to stdout as plain text. Default. |
json | A single JSON object identical to the result frame in stream-json. Useful for jq consumers that only need the final answer and cost totals. |
stream-json | Newline-delimited JSON (NDJSON). One frame per event — system/init first, per-turn tool and message frames, result last. The full automation contract; see The stream-json Protocol. |
Supplying input
By default caliban reads the prompt from the positional argument or --prompt. To pipe multi-line input, pass - as the prompt value and write to stdin:
git diff HEAD | caliban -p - "review these changes"
For multi-turn scripted sessions use --input-format stream-json to send NDJSON user frames on stdin instead. When this flag is active, a non-- inline prompt is rejected at startup (exit 64) to prevent accidentally bypassing the frame parser. See The stream-json Protocol for details.
Budget guard
--max-budget-usd <USD>
Caps the cumulative spend for a run. Cost is tracked against the vendored rate card in caliban-telemetry. When the budget is exceeded after a turn completes, caliban emits a result frame with subtype: "budget_exceeded" and exits 137. Unknown (provider, model) pairs contribute $0.00 and emit a single warning — the run is not blocked.
Deterministic runs with --bare
caliban --bare -p "count lines of code"
--bare skips hooks, skills, plugins, MCP server discovery, auto-memory, and CLAUDE.md walk-up. The agent runs with only its built-in tools and the flags you supply. Use it when you need a fully reproducible run that ignores user and project settings.
--bare controls what the agent loads; --no-auto-print controls whether headless mode fires automatically. They are independent.
Exit codes
| Code | Condition |
|---|---|
| 0 | Success |
| 1 | Generic runtime error (provider error, hook denial, tool crash) |
| 2 | Schema validation failed (--json-schema) |
| 64 | Bad flags / malformed stream-json input (EX_USAGE) |
| 66 | --resume <name> not found, or empty stream-json stdin (EX_NOINPUT) |
| 75 | --max-turns exceeded (EX_TEMPFAIL) |
| 78 | Config error — settings parse failure, stdin > 10 MB (EX_CONFIG) |
| 124 | Cancelled (Ctrl-C / SIGTERM from the agent loop) |
| 130 | Real SIGINT — second Ctrl-C reaching the harness |
| 137 | --max-budget-usd exceeded |
CI scripts can distinguish budget exhaustion from genuine failures without parsing stdout: $? carries the signal.
Session persistence
Print-mode runs honour --session <NAME>, --continue (-c), and --resume the same way as interactive sessions. Pass --no-save to skip writing the session back to disk after the run.
Related pages
- The stream-json Protocol — detailed frame reference
- Structured Output —
--json-schemafor schema-conformant replies - CI Patterns — complete recipes for GitHub Actions and other pipelines
The stream-json Protocol
--output-format stream-json is caliban's full automation contract. It emits newline-delimited JSON (NDJSON) to stdout, one frame per line, in a well-defined order. Downstream programs parse the stream with any JSON library and route on the type (and subtype) fields.
The protocol mirrors Claude Code's stream-json shape closely enough that most existing consumers work with minimal changes, while remaining provider-agnostic — token field names and cost breakdowns differ by provider and are not byte-identical to Claude Code.
Output frame types
system/init — first frame of every run
Emitted before any agent activity begins.
{
"type": "system",
"subtype": "init",
"session_id": "a3f7c2d1-...",
"model": "anthropic/claude-sonnet-4-6",
"tools": ["Bash", "Edit", "Glob", "Grep", "Read", "Write"],
"plugins": [],
"settingSources": ["managed", "user", "project"],
"mcp_servers": [],
"bare_mode": false,
"cwd": "/home/ci/repo",
"permission_mode": "acceptEdits"
}
settingSources uses camelCase for Claude Code parity. permission_mode values are default, acceptEdits, plan, auto, dontAsk, bypassPermissions, or "disabled" (when --no-permissions is in effect).
system/api_retry
Emitted when the provider triggers a retry (rate-limit, overload, transient network error).
{
"type": "system",
"subtype": "api_retry",
"attempt": 2,
"max_retries": 5,
"retry_delay_ms": 1500,
"error_status": 529,
"error_category": "overloaded"
}
error_category values: overloaded, rate_limit, timeout, network, server_error, other.
user — echo of the user prompt
Only emitted when --replay-user-messages is set.
{
"type": "user",
"content": [{"type": "text", "text": "fix the failing tests"}]
}
text — incremental assistant text delta
Only emitted when --include-partial-messages is set.
{"type": "text", "delta": "Here is the fix: "}
thinking — incremental reasoning delta
Emitted under --include-partial-messages when the model streams reasoning content (extended thinking models).
{"type": "thinking", "delta": "Let me check the test output…"}
tool_use and tool_result — progress frames
Each tool invocation produces a tool_use frame (emitted once the model finishes streaming the tool's input JSON) immediately followed by a tool_result frame (emitted once the tool completes).
{"type": "tool_use", "id": "toolu_01ABC", "name": "Bash", "input": {"command": "cargo test"}}
{"type": "tool_result", "tool_use_id": "toolu_01ABC", "is_error": false, "content": [{"type": "text", "text": "test result: ok. 42 passed"}]}
message — full assistant message (authoritative)
Emitted at the end of each turn when --include-partial-messages is not set. When --include-partial-messages is set, text deltas stream via text frames instead and no message frame is emitted.
{
"type": "message",
"role": "assistant",
"content": [
{"type": "text", "text": "All tests pass now."},
{"type": "tool_use", "id": "toolu_01ABC", "name": "Bash", "input": {"command": "cargo test"}}
]
}
Each tool call appears in both a short tool_use/tool_result pair and inside the subsequent message frame's content array. The short pair is a progress indicator; the message frame is the authoritative record for transcript reconstruction. Do not deduplicate — count one tool call per tool_use frame, not two.
hook_event
Only emitted when --include-hook-events is set.
{
"type": "hook_event",
"hookEventName": "PreToolUse",
"hookSpecificOutput": {"matcher": "Bash", "decision": "allow"}
}
hookEventName and hookSpecificOutput are camelCase (ADR 0024 parity).
warning
Non-fatal informational frames that do not terminate the run. Currently emitted for model substitution detected at the provider level.
{
"type": "warning",
"subtype": "model_mismatch",
"message": "model mismatch: requested \"llama3.1\" but provider responded with \"llama3.2\"",
"details": {"requested": "llama3.1", "actual": "llama3.2"}
}
result — always the last frame
{
"type": "result",
"subtype": "success",
"result": "All 42 tests pass.",
"session_id": "a3f7c2d1-...",
"total_cost_usd": 0.0034,
"turns": 3,
"total_input_tokens": 8210,
"total_output_tokens": 621
}
subtype values:
| subtype | Meaning | Key fields |
|---|---|---|
success | Run completed normally | result (assistant reply) |
error | Provider error, hook denial, tool crash, or schema validation failure | error, last_assistant_text, tool_calls_seen |
max_turns | --max-turns was reached (exit 75) | last_assistant_text, tool_calls_seen |
budget_exceeded | --max-budget-usd was reached (exit 137) | last_assistant_text, tool_calls_seen |
cancelled | Run was cancelled by Ctrl-C / SIGTERM (exit 124) | last_assistant_text, tool_calls_seen |
max_tokens | Per-turn output token budget exhausted | last_assistant_text, tool_calls_seen |
For non-success subtypes, result is absent. Read last_assistant_text for the most recent assistant reply and tool_calls_seen to distinguish an actively-looping agent (many tool calls, no clean finish) from one that stalled silently.
Stream-json input (--input-format stream-json)
Pass --input-format stream-json to make caliban read NDJSON user frames from stdin instead of a single prompt. This lets you drive multi-turn conversations from any language without a pseudo-TTY.
{"type": "user", "content": "fix the lint warnings"}
{"type": "user", "content": [{"type": "text", "text": "now run the tests"}]}
content can be a plain string or an array of {"type":"text","text":"…"} blocks. Unknown fields on user frames, unknown type values, and malformed JSON are hard parse errors — the run aborts with exit 64 and a result frame with subtype: "error". This is intentional: silent parsing of an unknown field would let a wrong envelope shape run the agent with a blank prompt.
A control/interrupt frame is accepted on stdin but the interrupt is not yet honored; caliban emits a stderr warning and continues.
When --input-format stream-json is active, an inline prompt is incompatible and is rejected at startup. Pass - (or omit the prompt entirely) to read from stdin.
Example NDJSON exchange
printf '{"type":"user","content":"how many Rust source files are here?"}\n' \
| caliban --output-format stream-json \
--input-format stream-json \
--replay-user-messages \
--bare
{"type":"system","subtype":"init","session_id":"b1c2...","model":"anthropic/claude-sonnet-4-6","tools":["Bash","Glob","Grep","Read"],"plugins":[],"settingSources":[],"mcp_servers":[],"bare_mode":true,"cwd":"/repo","permission_mode":"default"}
{"type":"user","content":[{"type":"text","text":"how many Rust source files are here?"}]}
{"type":"tool_use","id":"toolu_01","name":"Bash","input":{"command":"find . -name '*.rs' | wc -l"}}
{"type":"tool_result","tool_use_id":"toolu_01","is_error":false,"content":[{"type":"text","text":"142"}]}
{"type":"message","role":"assistant","content":[{"type":"text","text":"There are 142 Rust source files."},{"type":"tool_use","id":"toolu_01","name":"Bash","input":{"command":"find . -name '*.rs' | wc -l"}}]}
{"type":"result","subtype":"success","result":"There are 142 Rust source files.","session_id":"b1c2...","total_cost_usd":0.0012,"turns":1,"total_input_tokens":3100,"total_output_tokens":48}
Optional frame flags
| Flag | Effect |
|---|---|
--include-partial-messages | Emit text and thinking delta frames as the model streams |
--include-hook-events | Emit a hook_event frame for each fired hook |
--replay-user-messages | Echo each user prompt back as a user frame |
Related pages
- Print Mode — activating headless mode and output formats
- CI Patterns — parsing stream-json in scripts and Actions
Structured Output
--json-schema tells caliban to force the assistant's final reply into a JSON shape that matches a given schema. This is useful when a downstream script needs a machine-readable payload rather than freeform prose — a CI gate that needs a structured pass/fail verdict, a code-generation pipeline that expects a specific object shape, or any tool that would otherwise parse the reply with fragile string matching.
Supplying a schema
--json-schema <FILE_OR_JSON>
The argument is either:
- A path to a
.jsonfile:--json-schema ./schema.json - Inline JSON (detected when the value starts with
{or[):--json-schema '{"type":"object","required":["ok","message"]}'
What caliban does
- Runs the agent loop normally.
- After the final assistant turn, scans the reply for a balanced
{...}JSON object. If the whole reply is valid JSON it is used as-is; otherwise the first balanced{...}block is extracted. - Validates the extracted object against the schema (required fields present, top-level and per-property types match).
- On success: the validated object appears in the
structured_outputfield of theresultframe, and the process exits 0. - On failure: the
resultframe hassubtype: "error"and the validation message appears inerror. The process exits 2.
The built-in validator checks required fields and top-level type / per-property type constraints. It does not implement the full JSON Schema specification (no $ref, oneOf, pattern, etc.). Native provider-level structured output via the model router is planned and will extend coverage when available.
Worked example
Suppose you want caliban to report whether a repository's tests pass, in a structured format.
schema.json
{
"type": "object",
"required": ["passed", "summary"],
"properties": {
"passed": {"type": "boolean"},
"summary": {"type": "string"},
"failure_count": {"type": "integer"}
}
}
Invocation
caliban \
--output-format json \
--json-schema ./schema.json \
--bare \
-p "Run the test suite and tell me whether it passed. Reply only with JSON."
Successful result frame (stdout)
{
"type": "result",
"subtype": "success",
"result": "{\"passed\": true, \"summary\": \"42 tests passed, 0 failed\", \"failure_count\": 0}",
"session_id": "...",
"total_cost_usd": 0.0021,
"turns": 2,
"total_input_tokens": 5400,
"total_output_tokens": 310,
"structured_output": {
"passed": true,
"summary": "42 tests passed, 0 failed",
"failure_count": 0
}
}
Read structured_output in your script:
result=$(caliban --output-format json --json-schema schema.json --bare \
-p "Run tests and reply with JSON.")
passed=$(echo "$result" | jq '.structured_output.passed')
if [ "$passed" != "true" ]; then
echo "Tests failed"
exit 1
fi
Failed validation (exit 2)
{
"type": "result",
"subtype": "error",
"error": "missing required field `passed`",
"session_id": "...",
"total_cost_usd": 0.0018,
"turns": 1,
"total_input_tokens": 4800,
"total_output_tokens": 95,
"last_assistant_text": "All tests passed."
}
Tips
- Instruct the model to reply only with JSON in your prompt. Models that wrap their answer in prose (e.g. "Here is the result:
{...}") are handled — caliban scans for the first balanced{...}— but pure JSON replies validate more reliably. - Combine with
--bareto skip skills and hooks that might inject extra text into the reply. - In stream-json mode, the
structured_outputfield appears in the finalresultframe the same as injsonmode.
Related pages
- Print Mode — output formats and exit codes
- CI Patterns — complete pipeline recipes using structured output
CI Patterns
This page puts the headless flags together into complete, copyable recipes for GitHub Actions and other CI environments. Before reading further, familiarise yourself with Print Mode, The stream-json Protocol, and Headless & Audit.
Key flags for CI
| Flag | Purpose |
|---|---|
--bare | Skip hooks, skills, plugins, MCP, auto-memory, CLAUDE.md. Deterministic — output depends only on what you pass. |
--max-budget-usd <USD> | Hard spend cap; exit 137 if exceeded. Prevents runaway costs in long jobs. |
--permission-mode acceptEdits | Allow file edits without prompting; still denies shell commands the rules don't cover. |
--allow <PAT> | Add an Allow rule at top priority for this invocation only (repeatable). |
--no-save | Don't write the session to disk — keeps CI agents stateless. |
--output-format stream-json | Full NDJSON output for structured parsing. |
--output-format json | Single JSON result object — simpler for scripts that only need the answer and exit code. |
Exit codes are the primary success signal. See Print Mode — Exit codes for the full table. $? == 0 means success; $? == 137 means the budget cap fired.
Recipe 1 — Simple text answer in GitHub Actions
Suitable for jobs that just need a freeform answer and check the exit code.
# .github/workflows/caliban-check.yml
name: caliban check
on: [push]
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run caliban review
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
caliban \
--bare \
--max-budget-usd 0.50 \
--permission-mode acceptEdits \
--no-save \
-p "Review the diff for obvious bugs and print a one-sentence verdict."
The job fails if caliban exits non-zero (runtime error, budget exceeded, etc.).
Recipe 2 — Structured output with jq parsing
Use --output-format json and --json-schema when you need machine-readable output — for example, a gate that checks whether a review verdict is "pass" or "fail".
#!/usr/bin/env bash
# ci/review-gate.sh
set -euo pipefail
RESULT=$(caliban \
--output-format json \
--json-schema '{"type":"object","required":["verdict","reason"]}' \
--bare \
--max-budget-usd 1.00 \
--permission-mode acceptEdits \
--allow "Bash:git diff*" \
--allow "Read" \
--no-save \
-p "Review the staged changes. Reply ONLY with JSON: {\"verdict\": \"pass\"|\"fail\", \"reason\": \"<one sentence>\"}")
echo "Raw result: $RESULT"
VERDICT=$(echo "$RESULT" | jq -r '.structured_output.verdict')
if [ "$VERDICT" = "pass" ]; then
echo "Review passed: $(echo "$RESULT" | jq -r '.structured_output.reason')"
exit 0
else
echo "Review failed: $(echo "$RESULT" | jq -r '.structured_output.reason')"
exit 1
fi
Check $? first: if caliban exits non-zero before emitting a result frame (bad flags, budget blown before any turn, etc.) jq will fail on empty input. A pattern like RESULT=$(caliban … || true) followed by a $? check is more robust.
Recipe 3 — Multi-turn stream-json pipeline
For jobs that drive several agent turns or need to observe tool calls in real time, parse the NDJSON stream line by line.
#!/usr/bin/env bash
# ci/stream-pipeline.sh
set -euo pipefail
TASKS=$(cat <<'EOF'
{"type":"user","content":"Run the test suite and report any failures."}
{"type":"user","content":"If any tests failed, suggest a fix."}
EOF
)
LAST_RESULT=""
TOOL_CALLS=0
while IFS= read -r line; do
[ -z "$line" ] && continue
TYPE=$(echo "$line" | jq -r '.type')
case "$TYPE" in
system)
SUBTYPE=$(echo "$line" | jq -r '.subtype')
if [ "$SUBTYPE" = "init" ]; then
echo "[init] model=$(echo "$line" | jq -r '.model')"
fi
;;
tool_use)
TOOL_CALLS=$((TOOL_CALLS + 1))
echo "[tool] $(echo "$line" | jq -r '.name')"
;;
result)
LAST_RESULT="$line"
SUBTYPE=$(echo "$line" | jq -r '.subtype')
COST=$(echo "$line" | jq -r '.total_cost_usd')
echo "[result] subtype=$SUBTYPE cost=\$$COST tool_calls=$TOOL_CALLS"
;;
esac
done < <(echo "$TASKS" | caliban \
--output-format stream-json \
--input-format stream-json \
--bare \
--max-budget-usd 2.00 \
--permission-mode acceptEdits \
--allow "Bash:cargo test*" \
--no-save)
# Final check
EXIT_CODE=$?
SUBTYPE=$(echo "$LAST_RESULT" | jq -r '.subtype')
if [ "$EXIT_CODE" -ne 0 ] || [ "$SUBTYPE" != "success" ]; then
echo "Run did not succeed (exit=$EXIT_CODE subtype=$SUBTYPE)"
exit 1
fi
Permissions in headless mode
By default, caliban inherits all user and project permission rules in headless mode, just as in interactive sessions. For CI you typically want tighter control:
--permission-mode acceptEdits— auto-allows file edits; still asks (or denies) for shell commands not covered by a rule.--allow "Bash:git *"— add a top-priority Allow rule for specific shell patterns.--deny "Bash:rm -rf*"— add a top-priority Deny rule.--bare— skip all settings-derived rules (only built-in defaults apply).
For a full discussion of permission modes and how they interact with headless runs, see Headless & Audit.
bypassPermissions mode disables all permission gating. In a CI context this means an adversarially-crafted prompt or tool output could instruct caliban to delete files, exfiltrate secrets, or make network calls. Use acceptEdits instead and add explicit --allow rules for the shell patterns your job actually needs.
Parsing exit codes in shell
caliban --bare --max-budget-usd 0.20 -p "summarize the diff" || {
CODE=$?
case $CODE in
75) echo "Max turns exceeded";;
137) echo "Budget cap hit";;
124) echo "Cancelled";;
*) echo "Error: exit $CODE";;
esac
exit $CODE
}
Related pages
Telemetry & Cost
caliban tracks token usage and USD cost for every session using caliban-telemetry (ADR 0033). Cost accounting and context-window tracking work for all users regardless of whether OTLP export is enabled. OTLP emission to an external collector is opt-in.
Cost accounting
After each provider response, caliban-telemetry multiplies token counts by per-model rates from a vendored YAML rate card. The card ships with known rates for Anthropic, OpenAI, Google, Bedrock, Vertex, and Ollama (Ollama rows are $0.00).
Unknown (provider, model) pairs contribute $0.00 and emit a single debounced warning per session. Rates are updated in-tree; operators can override the card with CALIBAN_RATES_YAML=/path/to/rates.yaml.
USD arithmetic uses rust_decimal internally to avoid floating-point drift. Values are converted to f64 only at OTLP emit boundaries.
Slash commands
These commands work in the TUI regardless of whether OTLP export is on.
| Command | Description |
|---|---|
/cost | Cumulative USD spend with a per-model breakdown |
/usage | Cumulative token counts (input and output) with per-model breakdown |
/context | Context-window utilization — per-message-kind token breakdown, percentage of the model's context window used |
The /cost and /usage overlays share the same underlying CostAccumulator; /cost leads with dollar amounts, /usage leads with token counts. /context draws on ContextWindow, which is updated independently of OTLP emission.
Enabling OTLP export
OTLP export is off by default. Turn it on with the CALIBAN_ENABLE_TELEMETRY environment variable or the enable_telemetry setting:
Environment variable (any session)
CALIBAN_ENABLE_TELEMETRY=1 caliban
settings.toml / settings.json (persistent)
enable_telemetry = true
Privacy opt-outs DISABLE_TELEMETRY=1 and DO_NOT_TRACK=1 force-disable OTLP emission even when the master switch is on.
OTLP configuration
caliban adopts the standard OTEL_* env-var contract verbatim:
| Variable | Default | Purpose |
|---|---|---|
OTEL_EXPORTER_OTLP_ENDPOINT | — | Collector endpoint (required for OTLP) |
OTEL_EXPORTER_OTLP_PROTOCOL | grpc | grpc, http/protobuf, or http/json |
OTEL_EXPORTER_OTLP_HEADERS | — | Static auth / routing headers (k=v,k2=v2) |
OTEL_METRIC_EXPORT_INTERVAL | 60s | How often metrics are flushed |
OTEL_LOGS_EXPORTER | otlp | otlp, console, or none |
OTEL_METRICS_EXPORTER | otlp | Same options |
OTEL_TRACES_EXPORTER | otlp | Same options |
OTEL_LOG_USER_PROMPTS | 0 | Include user prompt text in log spans |
OTEL_LOG_TOOL_DETAILS | 0 | Include tool name/args in spans |
OTEL_LOG_TOOL_CONTENT | 0 | Include full tool output in spans |
OTEL_LOG_RAW_API_BODIES | 0 | Log raw provider request/response bodies (0, 1, or file:<dir>) |
mTLS is configured via OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE, OTEL_EXPORTER_OTLP_CLIENT_KEY, and OTEL_EXPORTER_OTLP_CERTIFICATE.
OTEL_LOG_USER_PROMPTS, OTEL_LOG_TOOL_CONTENT, and OTEL_LOG_RAW_API_BODIES send potentially sensitive content to your collector. Ensure your collector pipeline is appropriately access-controlled before enabling these.
Dynamic OTLP headers
Short-lived bearer tokens (e.g. from a secrets manager) can be injected without restarting caliban. Set telemetry.otel_headers_helper in your settings to a path; caliban spawns it at startup and periodically (telemetry.otel_headers_refresh, default 5m), parses stdout as key=value lines, and merges them with OTEL_EXPORTER_OTLP_HEADERS (helper wins on collision).
Alternatively, the env-var escape hatch CALIBAN_OTEL_HEADERS_HELPER=/path/to/script achieves the same effect without a settings file.
Metric names
OTLP metrics use the caliban. prefix (mirroring Claude Code's claude_code. names):
| Metric | Kind | Description |
|---|---|---|
caliban.session.count | Counter | Session start/end lifecycle events |
caliban.cost.usage | Counter (USD) | Cumulative cost per session |
caliban.token.usage | Counter | Input and output tokens |
caliban.lines_of_code.count | Counter | Lines touched by file-edit tools |
caliban.code_edit_tool.decision | Counter | Permission decisions on edit tools |
caliban.active_time.total | Gauge (seconds) | Wall time the agent loop ran |
Related pages
- Health Checks —
caliban doctorand/doctor - Settings Reference —
enable_telemetryandtelemetry.*keys
Health Checks
caliban doctor runs a suite of local checks and reports whether your installation is healthy. It exits 0 when all checks pass or warn, and exits 1 if any check fails. The same checks are available as the /doctor slash command in the TUI.
Running doctor
# Standard checks (no network calls)
caliban doctor
# Deep checks — pings every configured provider (costs one API call per provider)
caliban doctor --deep
Sample output:
caliban doctor — 11 check(s):
✓ settings — 2 scope file(s) loaded
✓ sandbox — tool dispatch goes via caliban-sandbox::SandboxedShim
✓ checkpoint_store — /home/user/.local/share/caliban/checkpoints
✓ session_store — /home/user/.local/share/caliban/sessions (writable)
✓ skills — 3 skill(s) loaded (scanned: /home/user/.claude/skills, ./.claude/skills)
✓ claudemd — 2 CLAUDE.md ancestor(s) found
✓ workspace — /home/user/repo (writable)
! ollama — OLLAMA_BASE_URL unset (no probe attempted; use --deep to ping localhost)
✓ openai — OPENAI_BASE_URL unset (no probe attempted; use --deep to ping api.openai.com)
✓ anthropic — https://api.anthropic.com reachable (45 model(s))
✓ google — GEMINI_BASE_URL unset (no probe attempted; use --deep to ping generativelanguage.googleapis.com)
What each check covers
| Check | What it verifies |
|---|---|
settings | Layered settings files load without parse errors; at least one scope file is present |
sandbox | Tool dispatch is wired through the OS sandbox shim |
checkpoint_store | The checkpoint store path is accessible |
session_store | The session store path exists and is writable |
skills | Skill roots are scanned and skills load without errors |
claudemd | At least one CLAUDE.md file is found in the workspace ancestry |
workspace | The current working directory is accessible and writable |
ollama | Ollama endpoint reachability (see below) |
openai | OpenAI / OpenAI-compatible endpoint reachability |
anthropic | Anthropic endpoint reachability |
google | Google Gemini endpoint reachability |
Provider reachability checks
Provider rows always appear in the output so you can see at a glance which providers are configured. The behavior depends on whether --deep is passed:
Without --deep:
- If the provider's base-URL env var is set, caliban probes the endpoint (Ollama:
/api/tags; others:/v1/models). - If the env var is unset, the row passes with a note that no probe was attempted. Use
--deepto ping the default endpoint.
With --deep:
- Caliban pings the configured (or default) endpoint unconditionally. This costs one real API call per provider that has an API key configured.
- If
--model <MODEL>was passed on the same invocation and a provider's model listing is available, the requested model is verified to be present. A missing model is reported as aFailrow.
Ollama does not require an API key. With --deep, caliban always probes http://localhost:11434 (or OLLAMA_BASE_URL if set) regardless of key configuration.
Exit codes
| Code | Meaning |
|---|---|
| 0 | All checks passed or warned |
| 1 | At least one check failed |
CI scripts can gate on caliban doctor to catch misconfigured installations before running a long job:
caliban doctor || { echo "caliban health check failed"; exit 1; }
/doctor in the TUI
The /doctor slash command runs the same checks inside an interactive session and prints the results to the transcript. Provider pings are always deep when invoked via /doctor (the session is already running and API keys are confirmed reachable). The /status command shows a brief one-line summary of the daemon and active session state.
Related pages
- Telemetry & Cost — OTLP export and cost accounting
- Headless & Audit — permission auditing in CI
CLI Reference
caliban is the main binary. Run it with no arguments to enter the interactive TUI; supply a prompt or flags to drive it headlessly or invoke a subcommand.
caliban [FLAGS/OPTIONS] [PROMPT]
caliban [FLAGS/OPTIONS] <SUBCOMMAND>
Prompts
| Flag | Default | Description |
|---|---|---|
PROMPT (positional) | — | User prompt text. Use - to read from stdin. |
--prompt <TEXT> | — | Alternative way to pass the prompt (same effect as positional). |
Headless / Print Mode
These flags activate and configure non-interactive (-p) mode. See Print Mode and The stream-json Protocol.
| Flag | Default | Description |
|---|---|---|
-p, --print [PROMPT] | — | Headless mode. Drives the agent non-interactively. Accepts an optional prompt; otherwise reads from --prompt, the positional PROMPT, or stdin (capped at 10 MiB). |
--output-format <FMT> | text | Stream output format. Values: text, json, stream-json. |
--input-format <FMT> | text | Stdin format. Values: text, stream-json. |
--no-auto-print | false | Suppress the automatic headless dispatch when stdout is piped or stdin is non-TTY. Explicit --print / --output-format always override this. |
--max-budget-usd <USD> | — | Abort the run (exit 137) once cumulative cost exceeds this value in USD. Unknown model/provider pairs contribute $0 and emit a warning. |
--bare | false | CI-deterministic mode: skips hooks, skills, plugins, MCP, auto-memory, and CLAUDE.md discovery. |
--json-schema <FILE_OR_JSON> | — | Force structured final output matching the given JSON Schema. Value can be inline JSON or a path to a .json file. |
--include-partial-messages | false | Emit assistant text deltas as separate text frames in stream-json mode (default: aggregate into one message frame). |
--include-hook-events | false | Emit a hook_event frame per fired hook event in stream-json mode. |
--replay-user-messages | false | Echo each user prompt as a user frame in stream-json mode. |
Session
| Flag | Default | Description |
|---|---|---|
-c, --continue | false | Resume the most recently updated session. |
-r, --resume <NAME> | — | Resume a named session. |
--session <NAME> | — | Load or create a named session; persists to the configured sessions directory. |
--no-save | false | Don't write the session back to disk after the run. |
--sessions-dir <DIR> | platform default | Override the sessions directory. |
Model & Provider
| Flag | Default | Description |
|---|---|---|
--provider <PROVIDER> | Resolved from settings, then anthropic | Provider to use. Values: anthropic, openai, ollama, google. |
--model <MODEL> | Provider default (see table below) | Model name. |
--fallback-model <MODEL> | From settings | Fallback model when the primary errors (ADR 0038). |
--max-tokens <N> | 8192 | Per-turn output token limit (must be ≥ 1). |
--max-turns <N> | 50 | Maximum agent loop iterations. |
--temperature <F> | — | Sampling temperature in [0.0, 2.0]. |
Provider defaults:
| Provider | Default model |
|---|---|
anthropic | claude-sonnet-4-6 |
openai | gpt-5.5 |
ollama | llama3.1 |
google | gemini-2.0-flash |
Workspace & Tools
| Flag | Default | Description |
|---|---|---|
--workspace <DIR> | Current working directory | Workspace root for file and shell tools. Must be an existing directory. |
--no-tools | false | Disable all tools (chat-only mode). |
--restrict-paths | false | Reject tool paths outside the workspace root. |
--quiet | false | Suppress tool-execution announcements. |
System Prompt
These flags are mutually exclusive.
| Flag | Default | Description |
|---|---|---|
--system <STRING> | — | Override system prompt with the given text. |
--system-file <PATH> | — | Override system prompt with the contents of a file. |
--no-system | false | Run with no system prompt (disables the default). |
Permissions
See Permission Modes and Managing Rules.
| Flag | Default | Description |
|---|---|---|
--allow <PAT> | — | Add an Allow rule at top priority. Repeatable. Pattern: Tool or Tool:first-arg-glob. |
--deny <PAT> | — | Add a Deny rule at top priority. Repeatable. |
--ask <PAT> | — | Add an Ask rule at top priority. Repeatable. |
--permission-mode <MODE> | From settings or default | Initial permission mode. Valid values (camelCase): default, acceptEdits, plan, auto, dontAsk, bypassPermissions. Env: CALIBAN_DEFAULT_PERMISSION_MODE. |
--no-permissions | false | Disable permission gating entirely (all tool calls allowed). Env: CALIBAN_NO_PERMISSIONS. Conflicts with --allow, --deny, --ask, --auto-allow. |
--auto-allow | false | Dangerous. Allow the model to run any Ask-rule tool without prompting in non-interactive mode. Env: CALIBAN_AUTO_ALLOW. |
--allow-dangerously-skip-permissions | false | Dangerous. Required to enter bypassPermissions mode. Without this flag the binary refuses to start in bypass mode. |
--disable-auto-mode | false | Disable the auto-mode classifier; every call falls through to the Ask handler (ADR 0029). Env: CALIBAN_DISABLE_AUTO_MODE. |
--permission-prompt-tool <MCP_TOOL> | — | Route permission Ask events to the named MCP tool via the MCP elicitation channel (ADR 0023 Phase C). |
Hooks, Skills, MCP & Plugins
| Flag | Default | Description |
|---|---|---|
--no-hooks | false | Bypass every external hook handler. In-process hooks (PermissionsHook, audit) still run. Env: CALIBAN_NO_HOOKS. |
--no-skills | false | Disable the Skill tool (no skill discovery at startup). Env: CALIBAN_NO_SKILLS. |
--no-mcp | false | Disable MCP server discovery (skips settings.json mcp_servers and the legacy mcp.toml shim). Env: CALIBAN_NO_MCP. |
--no-plugins | false | Disable plugin discovery (ADR 0030). Env: CALIBAN_NO_PLUGINS. |
--mcp-oauth-port <PORT> | 0 (ephemeral) | Override the loopback port for the OAuth callback server (ADR 0023 Phase C). Env: CALIBAN_MCP_OAUTH_PORT. |
--no-sub-agent | false | Disable the built-in AgentTool (the sub-agent primitive). Env: CALIBAN_NO_SUB_AGENT. |
Config & Settings
| Flag | Default | Description |
|---|---|---|
--config <PATH> | Walk-up discovery | Explicit path to caliban.toml. When the file declares [router], a model router is wired (ADR 0038). Env: CALIBAN_ROUTER_CONFIG. |
--settings <FILE_OR_JSON> | — | Inject a virtual settings scope above local (ADR 0026). Accepts inline JSON or a path to .json / .toml. |
--setting-sources <CSV> | All scopes | Restrict which settings.json scopes are read. CSV of managed,user,project,local. |
Caching & Performance
| Flag | Default | Description |
|---|---|---|
--max-attach-bytes <N> | 262144 (256 KB) | Maximum size of a single @-attachment in bytes. Env: CALIBAN_MAX_ATTACH_BYTES. |
--attach-budget-bytes <N> | 1048576 (1 MB) | Aggregate size cap across all @-attachments in one message. Env: CALIBAN_ATTACH_BUDGET_BYTES. |
--no-prompt-cache | false | Disable Anthropic-style prompt caching. Env: CALIBAN_NO_PROMPT_CACHE. |
--no-parallel-tools | false | Disable parallel tool execution (run tool_use blocks serially). Env: CALIBAN_NO_PARALLEL_TOOLS. |
--parallel-tool-limit <N> | CPU cores − 1 (min 1) | Max concurrent tool invocations per turn. Env: CALIBAN_PARALLEL_TOOL_LIMIT. |
Diagnostics
| Flag | Default | Description |
|---|---|---|
--debug | false | Append-log events and draws to the platform debug log. CALIBAN_DEBUG (any non-empty value) also enables this. |
Background Agents
| Flag | Default | Description |
|---|---|---|
--bg <TASK> | — | Spawn a background sub-agent with the given task and return immediately. Equivalent to caliban agents spawn --bg --prompt <TASK> (ADR 0037). |
Subcommands
caliban doctor [--deep]
Run health checks against the local caliban install (settings, MCP, sandbox, stores, providers). Exit 0 on pass, 1 on failure.
| Option | Description |
|---|---|
--deep | Include deep checks (provider auth pings — costs one API call per configured provider). |
caliban config
Inspect and migrate settings (ADR 0026).
| Sub-subcommand | Description |
|---|---|
config print | Print the merged effective settings as JSON, including the per-key scope chain. Honors --settings / --setting-sources. |
config migrate [--dry-run] | Round-trip legacy per-feature TOMLs (permissions.toml, mcp.toml, hooks.toml) into a single project-scope settings.json under <workspace>/.caliban/. |
caliban settings
Import and print settings files.
| Sub-subcommand | Description |
|---|---|
settings import --from <PATH> [--scope <SCOPE>] [--dry-run] | Import a settings JSON (Claude Code / Codex / legacy caliban) into canonical caliban TOML. Default scope: project. |
settings print [--scope <SCOPE>] | Print the settings for a scope (or the merged effective settings). Default scope: project. |
caliban perms
Manage permission rules across all config scopes. See Managing Rules.
| Sub-subcommand | Description |
|---|---|
perms list [--scope <SCOPE>] [--effective] [--json] | List permission rules. --effective shows the merged rule list across all scopes. |
perms test <TOOL> [INPUT_JSON] | Test whether a tool call would be allowed, denied, or asked. |
perms explain <TOOL> [INPUT_JSON] | Show which rule first matches a tool call. |
perms add <PATTERN> <ACTION> [--scope <SCOPE>] [--comment <TEXT>] [--reason <TEXT>] | Add a permission rule. Action: allow, ask, or deny. Default scope: project. |
perms remove [--index <N>] [--pattern <PAT>] [--scope <SCOPE>] | Remove a permission rule by ordinal or pattern. Default scope: project. |
perms import --from <PATH> [--scope <SCOPE>] [--dry-run] | Import rules from a foreign config (Claude Code JSON, legacy caliban TOML). Default scope: user. |
perms export [--scope <SCOPE>] [--format toml|json] | Export permission rules to stdout. Default format: toml. |
perms audit [--since <ISO>] [--tool <NAME>] [--action <ACTION>] [--head <N>] | Show the permission-decision audit log. |
perms lint [--scope <SCOPE>] | Check for duplicate or conflicting rules. Default scope: project. |
caliban agents
List, attach, and manage background sub-agents (ADR 0037).
| Sub-subcommand | Description |
|---|---|
agents list | List registered background agents. |
agents spawn --prompt <TEXT> [--label <LABEL>] | Spawn a new background agent. |
agents attach <ID> | Stream a running agent's transcript live (Ctrl+D detaches). |
agents logs <ID> | Print the agent's session log. |
agents kill <ID> | Terminate an agent (SIGTERM → SIGKILL after grace period). |
agents respawn <ID> | Restart an agent with the same spawn spec. |
agents rm <ID> [--force] | Remove an agent from the registry (must be stopped unless --force). |
Shortcut aliases (top-level sugar):
| Command | Equivalent to |
|---|---|
caliban attach <ID> | caliban agents attach <ID> |
caliban logs <ID> | caliban agents logs <ID> |
caliban stop <ID> | caliban agents kill <ID> |
caliban kill <ID> | caliban agents kill <ID> |
caliban respawn <ID> | caliban agents respawn <ID> |
caliban rm <ID> [--force] | caliban agents rm <ID> |
caliban daemon
Supervisor daemon management (ADR 0037).
| Sub-subcommand | Description |
|---|---|
daemon status | Print daemon health and the socket path. |
daemon stop | Ask the daemon to shut down gracefully. |
caliban router debug
Router diagnostics (ADR 0038).
| Sub-subcommand | Description |
|---|---|
router debug | Print the candidate list the router would resolve for a synthetic request, plus breaker state and effort knobs. |
caliban plugin <VERB> [ARGS…]
Manage plugin packages (ADR 0030). The plugin CLI parses its own verbs directly:
| Verb | Description |
|---|---|
plugin list | List all discovered plugins with enable/disable status. |
plugin info <NAME> | Show manifest details for a plugin. |
plugin install <NAME>@<MARKETPLACE> [--yes] | Install a plugin from a marketplace. |
plugin install --dir <PATH> | Install a plugin from a local directory. |
plugin update <NAME> [--yes] | Update an installed plugin. |
plugin remove <NAME> | Remove an installed plugin. |
plugin enable <NAME> | Enable a disabled plugin. |
plugin disable <NAME> | Disable an enabled plugin. |
Run caliban plugin help for the full plugin CLI reference.
Caliban follows ADR 0025 exit-code conventions: 0 = success, 1 = check/health failure, 64 = usage error (EX_USAGE), 78 = configuration error (EX_CONFIG), 130 = Ctrl+C, 137 = budget exceeded.
Settings Schema
This page is a typed, structured listing of every key in the caliban settings file. For a narrative explanation of how scopes interact, how to locate each file, and how to edit settings interactively, see Settings Reference and Settings Layering.
Settings files are TOML by primary convention (settings.toml / settings.local.toml); JSON is accepted on import only. Unknown top-level keys are tolerated for forward-compat.
Model / Agent
| Key | Type | Default | Description |
|---|---|---|---|
agent | string | — | Agent profile name (sub-agent dispatch hint). |
model | string | { provider, name } | — | Primary model. Bare string (e.g. "claude-sonnet-4-6") or qualified object { provider = "anthropic", name = "..." }. |
fallback_model | string | { provider, name } | — | Fallback model when the primary errors. Same shapes as model. |
model_overrides | { string → string } | {} | Per-route model overrides. Keys are router route names (e.g. "fast-classifier"); values are model ids. |
effort | "low" | "medium" | "high" | "max" | "auto" | — | Default reasoning effort level. |
Permissions
Nested under the [permissions] table.
| Key | Type | Default | Description |
|---|---|---|---|
permissions.allow | string[] | [] | Patterns that auto-allow (legacy bucket form). |
permissions.ask | string[] | [] | Patterns that prompt the user (legacy bucket form). |
permissions.deny | string[] | [] | Patterns that hard-deny (legacy bucket form). |
permissions.rules | RuleSpec[] | [] | Ordered v2 rule array. When non-empty, takes precedence over the three buckets above. Source order is preserved (first match wins). |
permissions.enforce | boolean | — | When true, refuse --no-permissions / bypass mode at startup. |
permissions.default_mode | string | — | Initial permission mode at session start. Values: default, acceptEdits, plan, auto, dontAsk, bypassPermissions. |
permissions.audit_log | boolean | true | Append-only permission-decision log toggle. |
RuleSpec fields (used in permissions.rules entries):
| Field | Type | Description |
|---|---|---|
pattern | string | Glob matching Tool or Tool:first-arg-glob (e.g. "Bash:git *"). |
action | "allow" | "ask" | "deny" | Decision for matching calls. |
comment | string (optional) | Human-readable comment shown in /permissions. |
reason | string (optional) | Deny reason shown to the operator and logged. |
expires_at | ISO 8601 timestamp (optional) | Rule is skipped after this time. |
[permissions]
# v2 ordered rules (preferred)
[[permissions.rules]]
pattern = "Bash:git *"
action = "allow"
comment = "git commands OK"
[[permissions.rules]]
pattern = "Bash:rm *"
action = "deny"
reason = "use git revert"
[[permissions.rules]]
pattern = "*"
action = "ask"
Hooks
| Key | Type | Default | Description |
|---|---|---|---|
hooks | { string → … } | {} | Raw hook event → handler list map (passed to caliban_agent_core::HooksConfig). |
disable_all_hooks | boolean | false | Kill-switch: disable every external hook handler. |
allow_managed_hooks_only | boolean | false | When true, only managed-scope hooks fire. |
allowed_http_hook_urls | string[] | [] | HTTP-hook URL allowlist (glob patterns). |
http_hook_allowed_env_vars | string[] | [] | Environment variable names that HTTP hooks are permitted to read. |
MCP Servers
Under [mcp_servers.<name>]. Each entry configures one MCP server.
| Key | Type | Default | Description |
|---|---|---|---|
type | "stdio" | "http" | "sse" | "stdio" | Transport selector. Also accepted as transport (TOML alias). |
command | string | "" | Executable command (stdio only). |
args | string[] | [] | Argv after the command (stdio only). |
env | { string → string } | {} | Environment variables injected for the server process (stdio only). |
cwd | string | — | Working directory override (stdio only). |
url | string | — | Absolute http:// or https:// URL (http/sse transports). |
headers | { string → string } | {} | Static request headers (http/sse only). |
oauth | "off" | "auto" | "manual" | "off" | OAuth mode (http/sse only). |
disabled | boolean | false | Mark this server disabled without removing the entry. |
permissions | object | — | Per-server permission scoping (composes with global rules). |
[mcp_servers.linear]
command = "npx"
args = ["-y", "@linear/mcp-server"]
Router
| Key | Type | Default | Description |
|---|---|---|---|
router | object | — | Router config (opaque; schema owned by caliban-model-router). Use caliban.toml [router] for the primary router config. |
Memory
Nested under [memory].
| Key | Type | Default | Description |
|---|---|---|---|
memory.auto_memory_enabled | boolean | — | Enable / disable auto-memory topic files. |
memory.auto_memory_directory | string | Platform default | Directory for auto-memory topic files. |
memory.cap_tokens_auto | integer | — | Token budget cap for the auto-memory tier. |
memory.cap_tokens_claude_md | integer | — | Token budget cap for the CLAUDE.md tier. |
memory.cap_tokens_combined | integer | — | Combined token budget cap across all tiers. |
Plugins
| Key | Type | Default | Description |
|---|---|---|---|
plugins | object | — | Plugin manager knobs (schema owned by caliban-plugins). |
UI
| Key | Type | Default | Description |
|---|---|---|---|
output_style | string | — | Active output-style name (see Output Styles). |
editor_mode | string | — | Input editing mode: "vim" or "emacs". |
view_mode | string | — | TUI layout mode: "compact" or "expanded". |
statusLine.command | string | — | Required when statusLine is set. Shell command whose stdout prefixes the status bar. |
statusLine.timeout_ms | integer (50–5000) | — | Maximum ms to wait for the status-line script. |
statusLine.padding | integer (0–8) | — | Spaces of padding around the custom segment. |
tui | object | — | TUI knobs. Known sub-key: showCostInStatusline (boolean). |
statusLine uses camelCase on disk for Claude Code compatibility. The TOML alias status_line is also accepted.
Auth
| Key | Type | Default | Description |
|---|---|---|---|
api_key_helper | string | object | object[] | — | Provider API-key supplier(s). Bare string = command path; object = { command, provider?, refreshIntervalMs?, slowHelperWarningMs? }; array = per-provider list. |
Observability
| Key | Type | Default | Description |
|---|---|---|---|
enable_telemetry | boolean | — | OTel / cost emitter toggle. |
Context-Window Management
| Key | Type | Default | Description |
|---|---|---|---|
auto_compact_threshold | number (0–1) or null | 0.75 | Pre-turn autocompaction threshold (context utilization fraction). null disables autocompact. |
micro_compact_enabled | boolean | true | Enable the per-turn microcompact (LLM-free supersession) pass. |
tool_result_cap_chars | integer (≥ 0) | 50000 | Global per-tool-result cap in characters. 0 disables. |
min_cache_block_tokens | integer (≥ 0) | 1024 | Minimum estimated tokens on the last user message to merit the conversation-level cache marker. |
Enterprise (Managed Scope)
| Key | Type | Default | Description |
|---|---|---|---|
parent_settings_behavior | "block" | "augment" | "augment" | When "block" in the managed scope, the managed layer flips to the top of the merge chain (enterprise lockdown). |
Miscellaneous
| Key | Type | Default | Description |
|---|---|---|---|
additional_directories | string[] | [] | Extra workspace roots to consult for CLAUDE.md and skills. |
claude_md_excludes | string[] | [] | Glob patterns to exclude from CLAUDE.md discovery (claudeMdExcludes). |
env | { string → string } | {} | Environment-variable overrides applied to child processes spawned by caliban. |
Slash Command Index
Type / in the interactive TUI to open the command picker, or type a command name directly. Commands marked hidden are accessible by name but do not appear in /help.
For a narrative introduction, see Slash Commands and Custom Slash Commands.
Session
| Command | Args | Description |
|---|---|---|
/clear | — | Clear the transcript and conversation history. Keeps system prompt, todos, plan-mode, and skills cache. |
/init | [--force] | Generate a CLAUDE.draft.md from available context sources (AGENTS.md, .cursorrules, .windsurfrules, README.md, git status). Refuses to overwrite an existing CLAUDE.md without --force. |
/resume | [query] | List persisted sessions sorted by most-recently-updated, with an optional case-insensitive substring filter. |
/recap | — | Summarize the conversation so far without mutating history. |
/export | [path] [--format json] | Export the session transcript to a file. Default format: Markdown. Default filename: caliban-session-<date>.md in the CWD. Pass --format json for JSON output. |
/btw | <question> | One-shot ephemeral question to a fast model (routed as FastClassifier); result inlined to transcript without touching the main session. |
Model & Auth
| Command | Args | Description |
|---|---|---|
/model | [id] | With no args: list the active provider's known model ids and the currently-selected one. With an id: switch the active model at runtime (same-provider in v1). |
/effort | <level> | Set reasoning effort for the next turn. Values: low, medium, high, max, auto. |
/status | — | Show provider / auth / subscription status. |
/login | — | Run the active provider's auth flow (full browser OAuth implementation pending the Auth spec). |
/logout | — | Clear cached credentials for the active provider (pending the Auth spec). |
/setup-token | — | Generate a long-lived Anthropic OAuth token for CI use (pending the Auth spec). |
Permissions
| Command | Args | Description |
|---|---|---|
/permissions | — | Open the permissions overlay. Shows current mode, bypass-latch state, and runtime rules. Tab cycles mode; d deletes the selected rule. |
Observability
| Command | Args | Description |
|---|---|---|
/usage | — | Show cumulative token and cost usage for this session, per model. |
/cost | — | Show cumulative cost and a per-(provider, model) breakdown with cache savings. |
/context | — | Show context window utilization and the top-N largest content blocks (by character count). |
/compact | — | Trigger the configured compactor; reports dropped/summarized message count. |
/doctor | [--deep] | Run startup-time health checks (settings, MCP, skills, hooks, auth). --deep adds provider auth pings. |
Memory
| Command | Args | Description |
|---|---|---|
/memory | [list|show <slug>|edit <slug>|delete <slug>] | View or edit memory tiers and auto-memory topic files. No args: show tier summary. |
Configuration & Extensibility
| Command | Args | Description |
|---|---|---|
/config | — | Open the tabbed settings editor overlay. |
/hooks | — | List configured hooks per event type with handler counts. |
/mcp | — | Open the MCP server status overlay. |
/plugins | — | List installed plugins with enable/disable status. |
/agents | — | List sub-agents. (Full fleet overlay arrives with the sub-agent isolation spec; use caliban agents list from a shell for now.) |
/skills | — | List skills loaded from .caliban/skills/ and other configured roots. |
Plan Mode
| Command | Args | Description |
|---|---|---|
/plan | — | Toggle plan mode. When ON, mutating tools are blocked. Reflected in the active session and statusline. |
Output
| Command | Args | Description |
|---|---|---|
/output-style | — | Show the active output style and the available list. Change the style via CALIBAN_OUTPUT_STYLE or output_style in settings. |
Diagnostics
| Command | Args | Description |
|---|---|---|
/rewind | — | Open the checkpoint/rewind picker overlay (ADR 0028). Also opened by pressing Esc Esc. |
/statusline | — | Show the active status-line command configuration (or instructions to set one). |
/loop | [--n=<count>] [--interval=<seconds>] | Re-run the last assistant turn N times (bounded by --max-turns). Default: 3 repeats, 15-second interval. |
/feedback | — | Submit feedback to the configured endpoint. Requires feedback_url in settings. |
/heapdump | — | Capture a heap profile (requires caliban to be rebuilt with --features=jemalloc-prof). |
/tui | — | Toggle fullscreen vs. default TUI mode (pending TUI ergonomics spec). |
General
| Command | Args | Description |
|---|---|---|
/help | — | List all visible registered slash commands. |
/quit | — | Exit caliban. |
/exit | — | Alias for /quit (hidden). |
You can add your own slash commands by placing skill files under .caliban/skills/<name>/SKILL.md. See Custom Slash Commands.
Environment Variables
Caliban reads environment variables in two groups: CALIBAN_* variables that control the harness itself, and per-provider API-key and endpoint variables. Most CALIBAN_* flags mirror a corresponding CLI flag; the CLI flag always wins when both are set.
Provider API Keys
| Variable | Provider | Purpose |
|---|---|---|
ANTHROPIC_API_KEY | Anthropic | Required. API key for the Anthropic provider. |
ANTHROPIC_BASE_URL | Anthropic | Optional. Override the Anthropic API base URL (useful for proxies or Bedrock-compatible endpoints). |
OPENAI_API_KEY | OpenAI | Required when using OpenAI. |
OPENAI_BASE_URL | OpenAI | Optional. Override the OpenAI API base URL (for LM Studio, Mistral, and other OpenAI-compatible endpoints). |
OPENAI_ORG_ID | OpenAI | Optional. OpenAI organization ID. |
OPENAI_PROJECT | OpenAI | Optional. OpenAI project ID. |
AZURE_OPENAI_API_KEY | Azure OpenAI | Required when using Azure OpenAI. |
AZURE_OPENAI_RESOURCE | Azure OpenAI | Required when using Azure OpenAI. Azure resource name. |
AZURE_OPENAI_API_VERSION | Azure OpenAI | Optional. API version string. Default: 2024-10-21. |
GEMINI_API_KEY | Required when using the Google provider. GOOGLE_GEMINI_API_KEY is checked as a fallback. | |
GOOGLE_GEMINI_API_KEY | Fallback for GEMINI_API_KEY. | |
OLLAMA_BASE_URL | Ollama | Optional. Base URL for the Ollama server. Default: http://localhost:11434. |
Headless & Print Mode
| Variable | Default | Description |
|---|---|---|
CALIBAN_MAX_ATTACH_BYTES | 262144 (256 KB) | Maximum size of a single @-attachment. Also settable via --max-attach-bytes. |
CALIBAN_ATTACH_BUDGET_BYTES | 1048576 (1 MB) | Aggregate size cap across all @-attachments in one message. Also settable via --attach-budget-bytes. |
Permissions & Security
| Variable | Default | Description |
|---|---|---|
CALIBAN_DEFAULT_PERMISSION_MODE | default | Initial permission mode. Values: default, acceptEdits, plan, auto, dontAsk, bypassPermissions. CLI --permission-mode wins when set. |
CALIBAN_NO_PERMISSIONS | — | Any non-empty value disables permission gating (all tool calls allowed). Conflicts with --allow, --deny, --ask, --auto-allow. |
CALIBAN_AUTO_ALLOW | — | Dangerous. Any non-empty value allows Ask-rule tools without prompting in non-interactive mode. |
CALIBAN_DISABLE_AUTO_MODE | — | Any non-empty value disables the auto-mode classifier; all calls fall through to Ask. |
Caching & Performance
| Variable | Default | Description |
|---|---|---|
CALIBAN_NO_PROMPT_CACHE | — | Any non-empty value disables Anthropic-style prompt caching. |
CALIBAN_NO_PARALLEL_TOOLS | — | Any non-empty value forces serial tool execution. |
CALIBAN_PARALLEL_TOOL_LIMIT | CPU cores − 1 (min 1) | Maximum concurrent tool invocations per turn. |
Hooks, Skills, MCP & Plugins
| Variable | Default | Description |
|---|---|---|
CALIBAN_NO_HOOKS | — | Any non-empty value bypasses every external hook handler. In-process hooks still run. |
CALIBAN_NO_SKILLS | — | Any non-empty value disables skill discovery at startup. |
CALIBAN_NO_MCP | — | Any non-empty value disables MCP server discovery. |
CALIBAN_MCP_OAUTH_PORT | 0 (ephemeral) | Loopback port for the MCP OAuth callback server (ADR 0023 Phase C). |
CALIBAN_MCP_TIMEOUT | — | Timeout (ms) for MCP server startup/connection. |
CALIBAN_MCP_TOOL_TIMEOUT | — | Per-tool-call timeout (ms) for MCP tools. |
CALIBAN_NO_PLUGINS | — | Any non-empty value disables plugin discovery. |
CALIBAN_ENABLED_PLUGINS | — | Comma-separated list of plugin names to enable (all others disabled). |
CALIBAN_PLUGIN_ROOT | — | Override the plugin install root directory. |
Sub-agents
| Variable | Default | Description |
|---|---|---|
CALIBAN_NO_SUB_AGENT | — | Any non-empty value disables the built-in AgentTool. |
CALIBAN_DAEMON_RUNTIME_DIR | Platform default | Override the runtime socket directory for the supervisor daemon. |
Memory
| Variable | Default | Description |
|---|---|---|
CALIBAN_DISABLE_AUTO_MEMORY | — | Any non-empty value disables auto-memory topic-file writing. |
CALIBAN_MEMORY_DIR | Platform default | Override the auto-memory topic files directory. |
CALIBAN_MEMORY_BUDGET_TOKENS | — | Total token budget across all memory tiers. |
CALIBAN_MEMORY_CAP_TOKENS_AUTO | — | Token budget cap for the auto-memory tier. |
CALIBAN_MEMORY_CAP_TOKENS_CLAUDE_MD | — | Token budget cap for the CLAUDE.md tier. |
CALIBAN_AUTO_MEMORY_DIRECTORY | — | Override the auto-memory directory (alias form). |
CALIBAN_DISABLE_CLAUDE_MD_WALK | — | Any non-empty value disables the CLAUDE.md walk-up discovery. |
CALIBAN_ADDITIONAL_DIRECTORIES_CLAUDE_MD | — | Colon-separated list of extra directories to search for CLAUDE.md. |
CALIBAN_CLAUDE_MD_EXCLUDES | — | Colon-separated glob patterns to exclude from CLAUDE.md discovery. |
CALIBAN_APPROVE_IMPORTS | — | Any non-empty value auto-approves CLAUDE.md @import statements. |
Checkpoints
| Variable | Default | Description |
|---|---|---|
CALIBAN_CHECKPOINT_ROOT | ~/.caliban/projects | Override the checkpoint root directory. |
CALIBAN_CHECKPOINT_DISABLED | — | Any non-empty value disables checkpoint recording and pruning. |
CALIBAN_CHECKPOINT_MAX_FILE_BYTES | — | Maximum checkpoint file size before rotation. |
CALIBAN_CLEANUP_PERIOD_DAYS | — | Number of days after which old checkpoint files are pruned. |
Configuration & Router
| Variable | Default | Description |
|---|---|---|
CALIBAN_ROUTER_CONFIG | Walk-up discovery | Explicit path to caliban.toml. Also settable via --config. |
CALIBAN_STRICT_ROUTING | — | Any non-empty value enables strict routing (no fallback to default route on unknown purpose). |
CALIBAN_API_KEY_HELPER_TTL_MS | — | TTL in milliseconds for API key helper subprocess cache. |
Output
| Variable | Default | Description |
|---|---|---|
CALIBAN_OUTPUT_STYLE | — | Name of the active output style (see Output Styles). |
CALIBAN_GRAPHICS | — | Graphics capability hint (e.g. kitty, sixel). |
Observability & Telemetry
| Variable | Default | Description |
|---|---|---|
CALIBAN_ENABLE_TELEMETRY | — | Any non-empty value enables OTel telemetry (settings enable_telemetry is also checked). |
CALIBAN_OTEL_HEADERS_HELPER | — | Command to supply dynamic OTel export headers. |
OTEL_EXPORTER_OTLP_ENDPOINT | — | OTel OTLP exporter endpoint URL. |
OTEL_EXPORTER_OTLP_PROTOCOL | grpc | OTel OTLP transport protocol. |
OTEL_EXPORTER_OTLP_HEADERS | — | Additional headers for the OTLP exporter. |
OTEL_METRIC_EXPORT_INTERVAL | 60s | OTel metric export interval. |
OTEL_LOGS_EXPORTER | otlp | OTel logs exporter type. |
OTEL_METRICS_EXPORTER | otlp | OTel metrics exporter type. |
OTEL_TRACES_EXPORTER | otlp | OTel traces exporter type. |
CALIBAN_RATES_YAML | — | Path to a YAML file overriding the built-in provider pricing rate card. |
Debug
| Variable | Default | Description |
|---|---|---|
CALIBAN_DEBUG | — | Any non-empty value enables the file-backed tracing subscriber (appends to the platform debug log). Also settable via --debug. |
Plugin Trust & Marketplace
| Variable | Default | Description |
|---|---|---|
CALIBAN_BLOCKED_MARKETPLACES | — | Comma-separated list of marketplace names to block. |
CALIBAN_STRICT_KNOWN_MARKETPLACES | — | Any non-empty value blocks installs from unrecognized marketplaces. |
CALIBAN_STRICT_PLUGIN_ONLY_CUSTOMIZATION | — | Any non-empty value restricts customization to plugins only (no user-level skills/hooks). |
When CALIBAN_PROVIDER is set, it overrides the --provider flag and settings-derived provider. This is the escape hatch for scripting scenarios where injecting a flag is inconvenient.
Files & Directories
Caliban follows platform conventions for each OS via the dirs crate. The tables below show the resolved path for each category on macOS, Linux (with XDG defaults), and Windows.
Many paths can be overridden with environment variables — see Environment Variables. The CALIBAN_CHECKPOINT_ROOT, CALIBAN_MEMORY_DIR, CALIBAN_DAEMON_RUNTIME_DIR, and CALIBAN_DEBUG variables are the most commonly needed.
Settings Files
Caliban loads settings from up to five scopes in precedence order (highest → lowest). See Settings Layering for merge semantics.
| Scope | macOS | Linux | Windows |
|---|---|---|---|
| Managed (enterprise) | /Library/Application Support/Caliban/managed-settings.{toml,json} | /etc/caliban/managed-settings.{toml,json} | C:\ProgramData\Caliban\managed-settings.{toml,json} |
| User | ~/Library/Application Support/caliban/settings.{toml,json} | $XDG_CONFIG_HOME/caliban/settings.{toml,json} (default: ~/.config/caliban/) | %APPDATA%\caliban\settings.{toml,json} |
| Project | <workspace>/.caliban/settings.{toml,json} | <workspace>/.caliban/settings.{toml,json} | <workspace>\.caliban\settings.{toml,json} |
| Local (gitignored) | <workspace>/.caliban/settings.local.{toml,json} | <workspace>/.caliban/settings.local.{toml,json} | <workspace>\.caliban\settings.local.{toml,json} |
| CLI overlay | Supplied via --settings <FILE_OR_JSON> | — | — |
Both .toml and .json are accepted at each scope. TOML is preferred; JSON is accepted for Claude Code import compatibility.
Sessions
Named sessions are stored as JSON files in the sessions directory.
| macOS | Linux | Windows |
|---|---|---|
~/Library/Application Support/caliban/sessions/<name>.json | $XDG_DATA_HOME/caliban/sessions/<name>.json (default: ~/.local/share/caliban/sessions/) | %LOCALAPPDATA%\caliban\sessions\<name>.json |
Override with --sessions-dir <DIR>.
Checkpoints
Checkpoints use a content-addressed layout keyed on a SHA-256 hash of the canonicalized workspace path.
| macOS | Linux | Windows |
|---|---|---|
~/.caliban/projects/<cwd-hash>/checkpoints/<session>/prompt-NNN/ | ~/.caliban/projects/<cwd-hash>/checkpoints/<session>/prompt-NNN/ | %USERPROFILE%\.caliban\projects\<cwd-hash>\checkpoints\<session>\prompt-NNN\ |
The <cwd-hash> is the first 16 hex characters of SHA-256(canonicalized_cwd).
Override the root with CALIBAN_CHECKPOINT_ROOT. Disable recording entirely with CALIBAN_CHECKPOINT_DISABLED.
Debug Log
Enabled by --debug or CALIBAN_DEBUG (any non-empty value). Append-only; rotated automatically.
| macOS | Linux | Windows |
|---|---|---|
~/Library/Caches/caliban/debug.log | $XDG_CACHE_HOME/caliban/debug.log (default: ~/.cache/caliban/) | %LOCALAPPDATA%\caliban\cache\caliban\debug.log |
Audit / Permission-Decision Log
Append-only JSONL log of every permission decision (allow/ask/deny) with tool name, matched rule, and session context. Enabled by default; disable via permissions.audit_log = false in settings.
| macOS | Linux | Windows |
|---|---|---|
~/Library/Application Support/caliban/permission-decisions.jsonl ¹ | $XDG_STATE_HOME/caliban/permission-decisions.jsonl (default: ~/.local/state/caliban/) | %LOCALAPPDATA%\caliban\permission-decisions.jsonl ¹ |
¹ macOS and Windows lack a state_dir equivalent; caliban falls back to data_local_dir (~/Library/Application Support/ / %LOCALAPPDATA%).
View with caliban perms audit [--since <ISO>] [--tool <NAME>] [--action <ACTION>] [--head <N>].
Skills
Skills are loaded from several roots, checked in this order:
| Root | macOS | Linux | Windows |
|---|---|---|---|
| Project | <workspace>/.caliban/skills/ | <workspace>/.caliban/skills/ | <workspace>\.caliban\skills\ |
| User | ~/Library/Application Support/caliban/skills/ | $XDG_CONFIG_HOME/caliban/skills/ | %APPDATA%\caliban\skills\ |
| Local data | ~/Library/Application Support/caliban/skills/ | $XDG_DATA_HOME/caliban/skills/ | %LOCALAPPDATA%\caliban\skills\ |
| Plugin-contributed | Varies per plugin install | — | — |
Each skill lives in a subdirectory with a SKILL.md file: <root>/<name>/SKILL.md.
Plugins
| Location | macOS | Linux | Windows |
|---|---|---|---|
| Project plugins | <workspace>/.caliban/plugins/ | <workspace>/.caliban/plugins/ | <workspace>\.caliban\plugins\ |
| User plugins | ~/Library/Application Support/caliban/plugins/ | $XDG_DATA_HOME/caliban/plugins/ (default: ~/.local/share/caliban/plugins/) | %LOCALAPPDATA%\caliban\plugins\ |
| Plugin trust store | ~/Library/Application Support/caliban/plugin-trust.json | ~/.local/share/caliban/plugin-trust.json | %LOCALAPPDATA%\caliban\plugin-trust.json |
| Marketplace allowlist | ~/.caliban/marketplaces-allowlist.json | ~/.caliban/marketplaces-allowlist.json | %USERPROFILE%\.caliban\marketplaces-allowlist.json |
MCP Configuration (Legacy)
The legacy mcp.toml is still loaded during the back-compat window:
| Location | macOS | Linux | Windows |
|---|---|---|---|
| Project | <workspace>/.caliban/mcp.toml | <workspace>/.caliban/mcp.toml | <workspace>\.caliban\mcp.toml |
| User | ~/Library/Application Support/caliban/mcp.toml | $XDG_CONFIG_HOME/caliban/mcp.toml | %APPDATA%\caliban\mcp.toml |
MCP servers are now configured in settings.toml under [mcp_servers]. See MCP Servers.
Hooks Configuration (Legacy)
Legacy hooks.toml files are still loaded during the back-compat window:
| Location | macOS | Linux | Windows |
|---|---|---|---|
| Project | <workspace>/.caliban/hooks.toml | <workspace>/.caliban/hooks.toml | <workspace>\.caliban\hooks.toml |
| User | ~/Library/Application Support/caliban/hooks.toml | $XDG_CONFIG_HOME/caliban/hooks.toml | %APPDATA%\caliban\hooks.toml |
Hooks are now configured in settings.toml under [hooks]. See Hooks.
Permissions Configuration (Legacy)
| Location | macOS | Linux | Windows |
|---|---|---|---|
| Project | <workspace>/.caliban/permissions.toml | <workspace>/.caliban/permissions.toml | <workspace>\.caliban\permissions.toml |
| User | ~/Library/Application Support/caliban/permissions.toml | $XDG_CONFIG_HOME/caliban/permissions.toml | %APPDATA%\caliban\permissions.toml |
Permissions are now configured in settings.toml under [permissions]. See Managing Rules.
Model Router Config
| Location | macOS / Linux / Windows |
|---|---|
| Project | <workspace>/caliban.toml (walk-up discovery) |
| User | ~/Library/Application Support/caliban/caliban.toml (macOS) / $XDG_CONFIG_HOME/caliban/caliban.toml (Linux) |
Override with --config <PATH> or CALIBAN_ROUTER_CONFIG.
Output Styles
| Location | macOS | Linux | Windows |
|---|---|---|---|
| Project | <workspace>/.caliban/output-styles/ | <workspace>/.caliban/output-styles/ | <workspace>\.caliban\output-styles\ |
| User | ~/Library/Application Support/caliban/output-styles/ | $XDG_CONFIG_HOME/caliban/output-styles/ | %APPDATA%\caliban\output-styles\ |
| Plugin-contributed | Via plugin data root | — | — |
Tool-Result Overflow Spill
When a tool result exceeds tool_result_cap_chars, the full result is spilled to disk and the inline message contains a truncated excerpt with a pointer.
| macOS | Linux | Windows |
|---|---|---|
~/Library/Caches/caliban/tool-overflows/<session-id>/<tool-use-id>.txt | $XDG_CACHE_HOME/caliban/tool-overflows/<session-id>/<tool-use-id>.txt | %LOCALAPPDATA%\caliban\cache\caliban\tool-overflows\<session-id>\<tool-use-id>.txt |
Falls back to /tmp/caliban-tool-overflows/ when the cache directory cannot be determined.
Input History
Per-project input history is stored alongside the checkpoint tree:
| All platforms |
|---|
~/.caliban/projects/<cwd-hash>/input-history.txt |
All project histories are accessible via ~/.caliban/projects/ (used by the Ctrl+R all-projects search scope).
Worktrees
Git worktrees managed by caliban are kept inside the repository:
| All platforms |
|---|
<repo-root>/.caliban/worktrees/<name>/ |
Supervisor / Daemon State
| macOS | Linux | Windows |
|---|---|---|
~/Library/Application Support/caliban/ (daemon data) | $XDG_DATA_HOME/caliban/ | %LOCALAPPDATA%\caliban\ |
$XDG_RUNTIME_DIR/caliban/ or ~/Library/Application Support/caliban/run/ (sockets) | $XDG_RUNTIME_DIR/caliban/ (sockets) | %LOCALAPPDATA%\caliban\run\ (sockets) |
Override with CALIBAN_DAEMON_RUNTIME_DIR.
On Linux, all $XDG_* variables are honored when set. If unset, the defaults shown above apply. macOS and Windows do not use XDG paths; the dirs crate maps to the platform-native locations shown.
Troubleshooting
This page covers the most common problems operators encounter and how to fix them. Start with caliban doctor — it checks the most likely failure points in one command.
Running caliban doctor
caliban doctor # quick sanity checks
caliban doctor --deep # adds provider auth pings (costs one API call per provider)
The output lists each check with a ✓ (pass), ! (warning), or ✗ (fail) prefix. Warnings such as "no CLAUDE.md found in ancestry" or "no scope files found" are informational; failures indicate something caliban cannot proceed without.
--deep issues a real model request to confirm provider auth. Run it when you suspect a key or endpoint problem, not on every invocation.
Provider authentication failures
Symptoms: Error: ANTHROPIC_API_KEY is not set, OPENAI_API_KEY is not set, or similar on startup.
Fixes:
-
Export the relevant key in your shell:
export ANTHROPIC_API_KEY=sk-ant-... export OPENAI_API_KEY=sk-... -
Or configure
apiKeyHelperin your settings file to fetch credentials dynamically. See Configuring Providers & API Keys. -
Run
caliban doctor --deepto confirm the key reaches the provider.
Malformed base URL: If you set OPENAI_BASE_URL to a URL that cannot be parsed (e.g. not://a:url), caliban may report a misleading "API key not set" error. Verify the URL is a valid HTTP/HTTPS address before exporting it.
Qwen3 on LM Studio: tool calls leak into reasoning
When running a Qwen3 reasoning model via LM Studio (MLX engine), you may see tool calls appear inside the model's thinking/reasoning channel rather than as structured tool_use blocks. The practical effects:
- 2-step tool chains (e.g. Glob → Read) usually complete correctly.
- Chains of 3 or more steps stall: the model re-emits the first tool call across multiple turns and hits
--max-turnswithout progressing.
This is an LM Studio MLX engine limitation, not a caliban defect. The same Qwen3 model on Ollama (GGUF) parses tool calls correctly — the leak does not reproduce there.
Multi-step agentic tasks (3+ tool calls) are unreliable when using Qwen3 reasoning models through LM Studio's MLX path. For agentic work, switch to Ollama or another server that handles Qwen-native <tool_call> XML parsing server-side.
Workarounds:
| Situation | Workaround |
|---|---|
| Need Qwen3 specifically | Switch to Ollama: --provider ollama --model qwen3.5:9b |
| Must use LM Studio | Limit chains to at most 2 tool calls; use --max-turns to prevent runaway loops |
| Reasoning is optional | Use a non-reasoning Qwen model (e.g. qwen2.5-coder-7b-instruct) |
Ollama: tool_call_id not round-tripped
Caliban's Ollama provider does not correlate tool_call_id across the request/response boundary — it is set on the outgoing tool result but is not echoed back by the Ollama server. This is a known limitation of the Ollama API and does not affect tool dispatch correctness in practice.
If you are building a custom consumer of the stream-json output and need to correlate tool_use and tool_result frames, use the id field on the tool_use frame and the tool_use_id field on tool_result as emitted by caliban — they match correctly on the client side regardless of provider.
Parallel sub-agents slow on self-hosted Ollama
If you run parallel sub-agents (AgentTool) against a self-hosted Ollama instance and they are slower than expected, the backend may be serialising requests due to OLLAMA_NUM_PARALLEL=1 (the default on most hardware).
On a NUM_PARALLEL=1 backend, parallel sub-agents do not increase throughput — every inference still queues at the single model slot, and the per-sub-agent overhead (a full reasoning + summary loop per agent) makes total wall time significantly longer than the parent doing the same work inline.
Options:
- Raise
OLLAMA_NUM_PARALLELon the server if your GPU has enough VRAM for multiple KV-cache allocations. - Use
--no-sub-agentand let the parent model read files inline. - Switch to a hosted provider (Anthropic, OpenAI) where each sub-agent gets independent fleet capacity.
- Cap dispatch with
--parallel-tool-limit Nto limit concurrent sub-agent calls.
Parallel sub-agents still provide context isolation (each sub-agent gets a fresh context window) even when NUM_PARALLEL=1. That can be worth the wall-time cost for long independent tasks, but not for latency-sensitive pipelines.
Headless Ask→deny remediation
In headless (-p) mode, tools that require user confirmation (the default "Ask" rule) are auto-denied because there is no TTY to prompt on. If a headless run silently fails to write a file or run a command, this is the likely cause.
Fix: add an explicit --allow rule or switch to --auto-allow for unattended runs:
# Allow a specific tool pattern
caliban -p "..." --allow "Write:**"
# Allow all tool calls (use with care)
caliban -p "..." --auto-allow
See Headless & Audit for the full headless permission model and how to configure durable rules.
--debug file logging
Pass --debug (or set CALIBAN_DEBUG=1) to write a detailed event + render log to disk. This is useful when diagnosing silent failures, unexpected tool behaviour, or TUI rendering issues.
Log file locations:
| OS | Path |
|---|---|
| macOS | ~/Library/Caches/caliban/debug.log |
| Linux / WSL | ~/.cache/caliban/debug.log |
The debug log grows quickly under active use. Delete or rotate it after capturing the relevant session. It contains full message content, tool inputs/outputs, and provider requests — do not share it if your prompts contain sensitive information.
The log appends across runs; it is not rotated automatically.
Glossary
Concise definitions for terms used throughout this guide. Each links to the chapter where the concept is covered in depth.
agent harness The runtime that drives the model → tool → model loop: reads user input, calls the provider, dispatches tool calls, feeds results back, and repeats until a terminal condition. Caliban is an agent harness. See What Is Caliban?.
auto-memory Per-project notes written by the model itself into a designated memory file. Injected into the system prompt on subsequent sessions. See Auto-Memory.
checkpoint
A snapshot of the conversation state (messages + file-tool pre-images) taken before each prompt. Used by /rewind to restore a prior state. See Checkpoints & Rewind.
compaction The process of summarising or truncating conversation history when the context window approaches its limit, allowing the session to continue. See Context & Compaction.
headless / print mode
Non-interactive operation via -p / --print. Caliban drives the agent without a TUI and emits text or structured JSON output to stdout. See Print Mode and The stream-json Protocol.
hook
An event-driven callback executed by an external command, HTTP endpoint, MCP tool, or in-process handler at defined points in the agent lifecycle (e.g. before_tool, SessionStart). See Hooks.
MCP server A Model Context Protocol server that exposes additional tools to caliban over stdio, HTTP/SSE, or streamable-HTTP transports. Caliban discovers and manages MCP servers via its settings. See MCP Servers.
memory tier
One of the three layers of context prepended to the system prompt: global (~/.claude/CLAUDE.md), project (<workspace>/CLAUDE.md), and auto-memory (model-written notes). See Memory Tiers.
message IR
The provider-neutral internal representation of conversation messages used by caliban-common. All providers translate to and from this IR so the agent core stays provider-agnostic. See Architecture & ADRs (ADR 0006).
output style A named instruction set (Default, Proactive, Explanatory, Learning, or custom) that shapes how the model formats and explains its responses. See Output Styles.
permission mode
A named preset that sets the default disposition for tool-call permission checks. Modes include default, acceptEdits, plan, auto, dontAsk, and bypassPermissions. See Permission Modes.
plugin
A self-contained bundle of skills, hooks, agents, MCP server configs, and output styles distributed as a directory with a plugin.json manifest. See Plugins.
provider An adapter that translates caliban's message IR to and from a specific model API (Anthropic, OpenAI, Ollama, Google, Bedrock, Vertex). See Supported Providers.
router
The caliban-model-router layer that selects a provider+model for each request based on configured rules, purpose keys, fallback chains, circuit breakers, and capability requirements. See The Model Router.
sandbox An OS-level confinement layer (macOS Seatbelt or Linux bubblewrap) applied to shell and file tools to restrict what they can access on the host. See The OS Sandbox.
session A persisted conversation: a named JSON file on disk containing the full message history for a continuous exchange. See Sessions & Persistence.
skill A markdown file with YAML frontmatter that the model can invoke as a tool. Skills encapsulate reusable workflows without requiring code. See Skills.
sub-agent A nested caliban instance spawned by the parent agent to execute a delegated task, optionally in an isolated git worktree. See Sub-agents.
tool A capability the model can invoke during a turn — built-in tools include Read, Write, Bash, Glob, Grep, Edit, WebSearch, and AgentTool. See Built-in Tools and Tool Execution.
Parity vs Claude Code
Caliban tracks feature parity with Claude Code in a living matrix. This page summarises the current state by theme. The full matrix — including per-row notes and ADR cross-references — lives at docs/parity-gap-matrix.md in the repository.
Legend: ✅ parity · 🟡 partial · 🔴 not yet
Theme summary
A — Permissions & safety ✅
Rule grammar (allow/ask/deny + globs), all six permission modes, the auto-mode classifier, the TUI Ask modal, OS-level sandbox (macOS Seatbelt + Linux bubblewrap), and the full caliban perms CLI with TOML writeback and audit log are all shipped. See ADRs 0020, 0029, 0032, and 0045.
B — Hooks & extensibility ✅
All hook event types (tool, session, compact, config, cwd, file, subagent, permission), hook decision protocol, and plugin packaging are shipped. The mcp/prompt/agent handler types are v1 stubs; per-subagent hook inheritance lands with the fleet spec.
C — Memory & checkpointing ✅
Three-tier prompt prefix, CLAUDE.md ancestor walk + @-imports, auto-memory, claudeMdExcludes, auto-checkpoint per prompt, /rewind, MicroCompact janitor, and tool-result size cap with overflow persistence are all shipped.
D — Configuration / settings ✅
Layered settings (managed > user > project > local), /config interactive editor, live reload, apiKeyHelper pool, and schema validation are shipped (ADR 0026 + 0045). TOML is the primary write format; JSON is accepted on read.
E — TUI ergonomics 🟡
Status bar, mouse scroll, transcript viewer, @file attach, ! shell escape, external editor (Ctrl+G), Ctrl+O transcript dump, background bash (Ctrl+B), image/vision input, permission Ask modal, and reverse history search are shipped. Notable gaps: vim editing mode (🔴), slash-menu typeahead (🟡 partial), multi-line input (🟡 partial), and voice dictation (🔴).
F — Built-in tools ✅
Bash, Edit, Glob, Grep, Read, Write, WebFetch, TodoWrite, Skill, AgentTool, NotebookEdit, MultiEdit, WebSearch, and background-bash are shipped. PowerShell tool and ToolSearch / WaitForMcpServers (relevant once MCP is fully real) are 🔴.
G — Sub-agents ✅
In-process AgentTool, git worktree isolation, background agent fleet (caliband daemon), per-agent memory dir, hook inheritance, and supervisor daemon are all shipped (ADR 0037).
H — MCP ✅
Config validation, real spawn/handshake, stdio + HTTP/SSE + streamable-HTTP transports, per-server permission scoping, /mcp slash, OAuth PKCE flow, elicitation, and resource references are shipped (ADR 0023).
I — Model router & providers ✅
Purpose-keyed routing, fallback chains, hedging, circuit breakers, capability filtering, Anthropic/OpenAI/Ollama/Google/Bedrock/Vertex providers, and effort levels are shipped. Azure Foundry is 🔴; extended-thinking toggle is 🟡 partial.
J — Headless / CI ✅
-p / --print mode, all output formats (text/json/stream-json), input formats, --max-turns, --max-budget-usd, --bare, --json-schema, --include-partial-messages, and --include-hook-events are shipped. GitHub Actions workflow and devcontainer feature are 🔴 (separate sub-projects).
K — Observability / cost ✅
tracing instrumentation, /context, /usage, /compact, proactive autocompact, prompt cache markers, cost tracking, OpenTelemetry export, and the custom status line are shipped. --debug / --debug-file is 🟡 partial. The feedback survey is 🔴.
L — Output styles ✅
All four built-in output styles (Default, Proactive, Explanatory, Learning) and custom output-style files are shipped (ADR 0031).
M — Slash command coverage 🟡
Core commands (/plan, /memory, /skills, /quit, /clear, /help, /init, /context, /usage, /compact, /config, /hooks, /mcp, /model, /effort, /resume, /cost, /export, /rewind, /doctor, /login, /logout, /status) are shipped. Theme customisation and skill-dependent commands (/code-review, /run, /verify, /batch) are 🔴.
N — Long-tail surfaces 🔴
IDE extensions (VS Code / Cursor / JetBrains), GitHub App, claude.ai/code web, iOS app, Slack, Remote Control, Channels, Routines, Deep links, and Teleport are all 🔴. These are parked until terminal/CLI parity is reached.
Notable gaps
| Gap | Status | Notes |
|---|---|---|
| Vim editing mode | 🔴 | TUI input layer |
| Azure Foundry provider | 🔴 | Provider adapter not yet written |
| GitHub Actions workflow | 🔴 | Separate sub-project |
| Devcontainer feature | 🔴 | Separate sub-project |
ToolSearch / WaitForMcpServers | 🔴 | Only relevant once MCP is fully real |
| Skill-dependent slash commands | 🔴 | /code-review, /run, /verify, /batch |
| Cloud / IDE / mobile surfaces (N) | 🔴 | All large investments; deferred |
The parity matrix is refreshed in the same PR that ships each feature. If a row above contradicts what you see in the matrix file, the matrix file is authoritative.
Crate Map
The caliban workspace is organised into ~24 crates across four main layers. This page gives an operator-facing orientation — enough to know which crate to look at when reading a log line, error message, or ADR. For architecture rationale, see Architecture & ADRs.
This map is for the curious. You do not need to know these crates to use caliban — they are implementation details that surface only in debug logs, error messages, and ADR references.
Layer 1 — Foundation
Shared types, abstractions, and utilities that every other layer depends on.
| Crate | Purpose |
|---|---|
caliban-common | Provider-neutral message IR, shared error types, and cross-crate utilities |
caliban-settings | Unified settings hierarchy (managed > user > project > local); file loading, schema validation, live reload, apiKeyHelper pool |
Layer 2 — Providers
One adapter per model API. Each translates caliban's message IR to the provider's wire format and back.
| Crate | Purpose |
|---|---|
caliban-provider | Provider trait definition and shared provider types |
caliban-provider-anthropic | Anthropic (Claude) adapter via Anthropic Messages API |
caliban-provider-openai | OpenAI adapter; also used for LM Studio, vLLM, and other OpenAI-compatible servers |
caliban-provider-ollama | Ollama adapter (native /api/chat endpoint, GGUF tool-call parsing) |
caliban-provider-google | Google AI Studio / Gemini adapter |
caliban-provider-bedrock | AWS Bedrock adapter (ADR 0034) |
caliban-provider-vertex | Google Cloud Vertex AI adapter (ADR 0034) |
caliban-model-router | Purpose-keyed routing, fallback chains, hedging, circuit breakers, capability filtering (ADR 0022, 0038) |
Layer 3 — Agent Core
The runtime that drives the model → tool → model loop.
| Crate | Purpose |
|---|---|
caliban-agent-core | Agent loop, turn handling, compaction strategies, permission dispatch, sub-agent orchestration |
caliban-tools-builtin | Built-in tools: Read, Write, Edit, Bash, Glob, Grep, WebFetch, TodoWrite, AgentTool, NotebookEdit, and others |
caliban-sandbox | OS-level tool confinement (macOS Seatbelt, Linux bubblewrap) (ADR 0032) |
caliban-skills | Skill discovery, frontmatter parsing, and SkillTool invocation (ADR 0019) |
caliban-mcp-client | MCP server lifecycle: spawn, handshake, list_tools, transports, OAuth (ADR 0017, 0023) |
caliban-plugins | Plugin package management: manifest parsing, trust gating, namespace expansion (ADR 0030) |
caliban-images | Image / vision input: clipboard, @path, drag-and-drop, provider wire shapes (ADR 0039) |
Layer 4 — Sessions, State & Infrastructure
Persistence, memory, observability, and the background fleet.
| Crate | Purpose |
|---|---|
caliban-sessions | Session persistence (JSON on disk), load/save, session directory management |
caliban-checkpoint | Per-prompt checkpoint snapshots and /rewind restoration (ADR 0028) |
caliban-memory | Three-tier memory (global/project/auto-memory), CLAUDE.md ancestor walk and @-imports (ADR 0018, 0035, 0036) |
caliban-output-styles | Built-in and custom output style loading and activation (ADR 0031) |
caliban-telemetry | OpenTelemetry export, cost accounting, metric emission (ADR 0033) |
caliban-worktrees | Git worktree creation and lifecycle management for sub-agent isolation (ADR 0037) |
caliban-supervisor | Background agent fleet and caliband supervisor daemon (ADR 0037, 0042) |
The binary
| Crate | Purpose |
|---|---|
caliban | The caliban binary: CLI parsing (args.rs), startup pipeline, TUI (ratatui), headless dispatch, and subcommand handlers |
Architecture & ADRs
Caliban captures every significant architectural decision in an Architecture Decision Record (ADR). Each ADR states the context, the decision, and its consequences — giving contributors (and curious operators) the rationale behind the design, not just the outcome.
ADRs live in the docs/adr/ directory of the repository. They use a lightweight MADR-lite format and carry a status:
- accepted — currently in effect
- superseded — replaced by a later ADR; kept for history
- proposed — under discussion, not yet in effect
- rejected — considered and explicitly declined
You do not need to read ADRs to use caliban. They exist for contributors and operators who want to understand why something works the way it does. For crate orientation, see Crate Map.
ADR index
Foundation
| # | Title | Status |
|---|---|---|
| 0000 | Record architecture decisions (MADR-lite under docs/adr/) | accepted |
| 0001 | Async runtime → tokio | accepted |
| 0002 | Error model → thiserror for libs, anyhow for binary | accepted |
| 0003 | License → AGPL-3.0-only | accepted |
| 0004 | Naming → caliban-* libraries, caliban binary | accepted |
| 0005 | Workspace layout → crates/ for libs, binaries at root | accepted |
Provider & message model
| # | Title | Status |
|---|---|---|
| 0006 | Message schema → provider-neutral IR | accepted |
| 0007 | Schema/transport factoring via Transport trait | accepted |
| 0008 | Role::System is positional (leading-only) | accepted |
Agent core
| # | Title | Status |
|---|---|---|
| 0009 | Agent-core design (stream-as-primitive, sequential tools, opt-in compaction) | accepted (sequential-tools clause superseded by 0016) |
| 0010 | WorkspaceRoot path resolution + opt-in restricted mode | accepted |
| 0016 | Parallel tool dispatch (semaphore-bounded; supersedes 0009 sequential clause) | accepted |
| 0021 | Sub-agent primitive (AgentTool; synchronous in-process; allowlist-filtered registry) | accepted |
TUI & sessions
| # | Title | Status |
|---|---|---|
| 0011 | Sessions persisted to disk + interactive REPL | accepted |
| 0012 | TUI via ratatui (replacing the rustyline REPL) | accepted |
| 0013 | TUI overlays + layout v2 | accepted |
| 0014 | Default system prompt + TUI stall fixes + debug logging | accepted |
| 0015 | Context preservation + path conventions (~ expansion) | accepted |
| 0027 | TUI ergonomics (@file, !, Ctrl+G, Ask modal, transcript viewer) | accepted |
| 0041 | TUI redraw tick — close-out (resolves 0014 open question) | accepted |
Memory & checkpointing
| # | Title | Status |
|---|---|---|
| 0018 | Memory tier model (global / project / auto-memory; spliced into system prompt) | accepted |
| 0028 | Auto-checkpointing + /rewind | accepted |
| 0035 | Auto-memory (model-written notes per project) | accepted |
| 0036 | CLAUDE.md ancestor walk + @-imports | accepted |
Permissions & safety
| # | Title | Status |
|---|---|---|
| 0020 | Permission rules layered on Hooks (TOML rule sources; interactive Ask) | accepted |
| 0029 | Permission modes (acceptEdits / auto / dontAsk / bypassPermissions) + auto-mode classifier | accepted |
| 0032 | OS-level sandbox (macOS Seatbelt + Linux bubblewrap) | accepted |
| 0045 | Permissions v2 — TOML-primary config + richer rule schema | accepted |
Configuration & settings
| # | Title | Status |
|---|---|---|
| 0026 | Unified settings hierarchy (managed > user > project > local) | accepted |
| 0043 | arc-swap as the read-mostly shared-state primitive | accepted |
Extensibility: hooks, skills, plugins, output styles
| # | Title | Status |
|---|---|---|
| 0019 | Skills loading & invocation (frontmatter + body; SkillTool on-demand load) | accepted |
| 0024 | Hook event taxonomy (expanded events + handler types) | accepted |
| 0030 | Plugin packaging (skills + hooks + agents + MCP + output-styles bundles) | accepted |
| 0031 | Output styles (Default / Proactive / Explanatory / Learning + custom) | accepted |
| 0040 | Slash command registry (extensible SlashCommand trait) | accepted |
MCP
| # | Title | Status |
|---|---|---|
| 0017 | MCP client architecture (stdio v1; tools surface as mcp__<server>__<tool>) | accepted |
| 0023 | MCP v2 — transports, OAuth, elicitation, resources | accepted |
| 0044 | rmcp 1.7 version pin (dedicated-PR bumps) | accepted |
| 0046 | Two-stage tool surface — lazy MCP schema loading + ToolSearch | accepted |
Model router & providers
| # | Title | Status |
|---|---|---|
| 0022 | Model routing architecture (Layer 3 caliban-model-router; router-impl-Provider) | accepted |
| 0034 | Bedrock + Vertex providers | accepted |
| 0038 | Model router v2 (fallback / hedging / circuit breakers / capability filtering) | accepted |
| 0039 | Image / vision input | accepted |
Headless / CI & observability
| # | Title | Status |
|---|---|---|
| 0025 | Headless / print mode + JSON output protocol | accepted |
| 0033 | OpenTelemetry export + cost accounting | accepted |
Sub-agents & background fleet
| # | Title | Status |
|---|---|---|
| 0037 | Sub-agent worktree isolation + background fleet | accepted |
| 0042 | caliband sibling-binary placement (under caliban-supervisor) | accepted |
Architecture Decision Records
- ADR 0000 · Record architecture decisions
- ADR 0001 · Async runtime →
tokio - ADR 0002 · Error model →
thiserrorfor libraries,anyhowfor binary - ADR 0003 · License →
AGPL-3.0-only - ADR 0004 · Naming →
caliban-*libraries,calibanbinary - ADR 0005 · Workspace layout →
crates/for libraries, binaries at root - ADR 0006 · Message schema → provider-neutral IR
- ADR 0007 · Schema/transport factoring via Transport trait
- ADR 0008 · Role::System messages are positional (leading-only)
- ADR 0009 · Agent-core design (stream-as-primitive, sequential tools, opt-in compaction)
- ADR 0010 · WorkspaceRoot path resolution + opt-in restricted mode
- ADR 0011 · Sessions persisted to disk + interactive REPL
- ADR 0012 · TUI via ratatui (replacing the rustyline REPL)
- ADR 0013 · TUI overlays + layout v2 (input bracketed by horizontal rules)
- ADR 0014 · Default system prompt + TUI stall fixes + debug logging
- ADR 0015 · Context preservation + path conventions (~/dev fix)
- ADR 0016 · Parallel tool dispatch (supersedes ADR 0009 §"sequential tools")
- ADR 0017 · MCP client architecture
- ADR 0018 · Memory tier model (CLAUDE.md ingestion + auto-memory)
- ADR 0019 · Skills loading
- ADR 0020 · Permission rules layered on top of
Hooks - ADR 0021 · Sub-agent primitive via
AgentTool - ADR 0022 · Model routing architecture
- ADR 0023 · MCP v2 — transports, OAuth, elicitation, resources
- ADR 0024 · Hook event taxonomy + external handler types
- ADR 0025 · Headless
-pmode + JSON output protocol - ADR 0026 · Layered settings.json +
/configeditor - ADR 0027 · TUI ergonomics pack
- ADR 0028 · Checkpointing +
/rewind - ADR 0029 · Permission modes + auto-mode classifier
- ADR 0030 · Plugin packaging
- ADR 0031 · Output styles
- ADR 0032 · OS-level sandbox
- ADR 0033 · OpenTelemetry export + cost tracking
- ADR 0034 · Bedrock + Vertex providers
- ADR 0035 · Auto-memory (model-written notes)
- ADR 0036 · CLAUDE.md ancestor walk +
@-imports - ADR 0037 · Sub-agent worktree isolation + background fleet
- ADR 0038 · Model router v2 — fallback, hedging, breakers, capabilities, binary wiring
- ADR 0039 · Image + vision input
- ADR 0040 · Slash command registry
- ADR 0041 · TUI redraw tick close-out
- ADR 0042 ·
calibandsibling-binary placement - ADR 0043 ·
arc-swapas the read-mostly shared-state primitive - ADR 0044 ·
rmcp1.7 version pin - ADR 0045 · Permissions v2 — TOML-primary config + richer rule schema
- ADR 0046 · Two-stage tool surface — lazy MCP schema loading + ToolSearch
- ADR 0047 · Interactive background sub-agents (idle / await-input)
ADR 0000 · Record architecture decisions
- Status: accepted
- Date: 2026-06-14
Context
caliban has kept Architecture Decision Records since the Layer-0 bootstrap
(ADRs 0001–0047). The original 2026-05-22 Layer-0 bootstrap design
placed them at the repository root in adrs/, reasoning that ADRs are first-class
Layer-0 deliverables and top-level placement makes them impossible to miss.
Since then the sibling repositories adopted the conventional
adr-tools / MADR
layout instead: prospero and gonzalo both keep their records under
docs/adr/, seed the log with a meta "record architecture decisions" entry, and
(prospero) ship a template.md. caliban was the outlier — root adrs/ (plural),
no meta record, no template — which created cross-repo confusion and path
mismatches for anyone moving between the three repos.
There was no ADR stating why caliban records decisions or where they live; that rationale lived only in a feature design doc, which is exactly the kind of external dependency ADRs are supposed to avoid.
Decision
We will keep Architecture Decision Records under docs/adr/, in MADR-lite
format (a lightweight extension of Michael Nygard's original ADR style), matching
sibling repos prospero and gonzalo. Specifically:
- Location:
docs/adr/(singularadr), not the former rootadrs/. This supersedes the root-placement decision in the Layer-0 bootstrap design; existing records were relocated withgit mvto preserve history. - This meta record is numbered
0000so the existing0001–0047numbering is preserved — no renumbering churn, and the log still opens with a record of the practice itself. - Format: each ADR is one append-only file
NNNN-kebab-title.mdwith Context, Decision, and Consequences. A decision is changed by writing a new ADR that supersedes the old one, never by rewriting history. - Template: new ADRs start from
template.md. - Status legend:
accepted/superseded/proposed/rejected, indexed inREADME.md.
Consequences
- Positive: one consistent ADR convention across caliban / gonzalo / prospero;
the conventional, tooling-friendly
docs/adr/location; and the rationale for the practice now lives in an ADR rather than a feature design doc, so it is self-sustaining. - Negative: a one-time churn to relocate the directory and update every inbound
reference (crate rustdoc, README, the mdBook guide, the parity matrix, and the
historical design docs). ADRs no longer sit at the repo root, so they are slightly
less discoverable from a bare
ls— mitigated by a pointer from the top-levelREADME.md. - Revisit if: the agreed cross-sibling ADR standard changes, or the
docs/adr/layout proves harder to maintain than the root placement it replaced.
ADR 0001 · Async runtime → tokio
- Status: accepted
- Date: 2026-05-22
Context
caliban's foundation is heavily I/O-bound: provider HTTPS calls, streaming
responses from LLM endpoints, MCP transports, and eventually a multi-session
orchestrator. Rust's async story is fragmented across runtimes (tokio,
async-std, smol, embassy), and futures from one runtime cannot
always be polled by another. Picking a runtime up front prevents subtle
cross-runtime breakage as the workspace grows.
Decision
Standardize on tokio (multi-threaded scheduler, features = ["full"])
across every crate in the workspace. The workspace root pins the version
in [workspace.dependencies]; member crates declare tokio.workspace = true
and may select their own feature subset.
No nested runtimes. Each binary creates a single tokio::runtime::Runtime
(or uses #[tokio::main]) for its entire lifetime.
Consequences
- Positive: direct compatibility with
reqwest,tower,hyper,axum,tonic, every major MCP transport, and most LLM SDKs. Predictable async behavior across the workspace. Easy onboarding — tokio is the de facto Rust async runtime. - Negative: locks the workspace out of
smol/embassyecosystems (acceptable — no embedded targets planned). Binary size larger than a minimal runtime would produce. - Revisit if: caliban needs to run in a
no_stdorembassy-only environment, or if a critical dependency requires a different runtime.
ADR 0002 · Error model → thiserror for libraries, anyhow for binary
- Status: accepted
- Date: 2026-05-22
Context
Rust libraries benefit from precise error enums — consumers want to match on variants and react differently to different failure modes. Binaries benefit from ergonomic context propagation — operators want a readable error chain showing where things went wrong, not pattern- matching on every variant.
A shared "uber error" crate that every other crate depends on creates a foundation-coupling crate and forces every error change to ripple through the workspace. We want errors to be local.
Decision
Every caliban-* library crate defines its own Error enum using
thiserror, and exposes:
#![allow(unused)] fn main() { pub type Result<T> = std::result::Result<T, Error>; }
Cross-crate errors convert at boundaries with #[from] or explicit
From impls. No shared error crate.
The caliban binary will use anyhow::Result in main() and top-level
command handlers once real command logic exists. ? propagates errors
with context using .context("...") from anyhow::Context.
At Layer 0 the binary is an argv-only stub returning std::process::ExitCode
directly (so it can distinguish exit codes 0 / 2 for success vs. misuse);
anyhow is declared as a workspace-inherited dependency and will be
imported as soon as the first error-propagating command lands.
Consequences
- Positive: adding a new error variant is local to one crate. Library consumers can match precisely; binary code gets readable context. No god-error-crate.
- Negative: slight boilerplate per library (the
Errorenum andResultalias).Fromimpls must be added at boundaries. - Revisit if: a real shared error type emerges (e.g., a cross-crate "Cancelled" or "Timeout" that every layer must surface identically).
ADR 0003 · License → AGPL-3.0-only
- Status: accepted
- Date: 2026-05-22
Context
caliban is private now but designed to be open-sourced. The author explicitly rejects permissive defaults (MIT, Apache-2.0): the goal is to enforce community contribution from downstream users and hosted- service providers, not maximize commercial adoption.
The relevant tiers of copyleft are:
- GPL-3.0 — strong copyleft on distribution; SaaS providers can modify and host without releasing source (the "SaaS loophole").
- AGPL-3.0 — closes the SaaS loophole: hosting modified code as a network service triggers the obligation to release source.
- SSPL — stronger than AGPL but not OSI-recognized as open source.
- MPL-2.0 — file-level (weak) copyleft; consumers don't have to copyleft their downstream code.
Decision
Every crate's Cargo.toml declares license = "AGPL-3.0-only" via
license.workspace = true. The full AGPL-3.0 text lives in LICENSE
at the workspace root. The README states the license prominently and
explains the implications for service operators and forks.
Consequences
- Positive: forks and hosted services must release modifications. Aligns with Mastodon, Nextcloud, Gitea, and Sourcehut — all of which have used AGPL successfully to balance openness with sustainable community contribution. Author's stated philosophy is enforced.
- Negative: caliban crates won't compose into permissive Rust
projects on crates.io — depending on
caliban-*makes the consumer AGPL. This is intentional: caliban is an end product, not a general-purpose library to be embedded. - Revisit if: the AGPL is preventing a legitimate non-commercial use case the author wants to support. A future ADR could carve out exceptions or dual-license specific crates.
ADR 0004 · Naming → caliban-* libraries, caliban binary
- Status: accepted
- Date: 2026-05-22
Context
Crate names on crates.io are globally unique. If we eventually publish,
we need names that aren't already taken and that signal ownership.
Within the workspace, naming conventions also affect ergonomics —
module paths, import statements, and clippy's module_name_repetitions
lint all interact with crate names.
Decision
- Library crates use the
caliban-prefix:caliban-core,caliban-provider,caliban-agent-core, etc. Directory name matches the package name. - Binary crate is named
caliban. Its package name iscaliban, so Cargo's default binary name matches;caliban/Cargo.tomlmakes this explicit with a[[bin]] name = "caliban"entry for clarity. - Internal module paths drop the prefix where it would be
redundant:
caliban_provider::ProviderClient, NOTcaliban_provider::CalibanProviderClient. - Clippy's
module_name_repetitionslint is allowed at the workspace level to support the internal-naming convention without fighting clippy on every type.
Consequences
- Positive: all caliban crates can be reserved on crates.io ahead
of public release.
cargo install calibanworks once published. Internal type names stay terse. - Negative: ~9 extra characters of typing per crate reference in
Cargo.tomldependency lists. Slight redundancy in long import paths (caliban_core::caliban_core_specific::...— avoided in practice by short module names). - Revisit if: the workspace gains so many crates that the prefix becomes overhead, or if a sub-org / sub-product emerges that warrants its own prefix.
ADR 0005 · Workspace layout → crates/ for libraries, binaries at root
- Status: accepted
- Date: 2026-05-22
Context
The workspace is planned to grow to ~11 crates across 4 layers (foundation, integration, routing, UX surfaces). Layout patterns seen in Rust workspaces:
- Flat (
crate1/,crate2/at root) — used by tokio, serde, axum. Simpler for small workspaces, clutters root past ~8 crates. - All-in-
crates/— used by ruff. Binary and libraries intermingled; clean root but binary entry points are buried. - Apps/libs split (
crates/for libs,apps/for bins) — principled but less common; over-engineered for our size. - Binaries at root, libraries in
crates/— used by deno (withcli/), zed, helix. Entry points are top-level visible; libraries are clearly cataloged.
Decision
Adopt the last pattern: library crates under crates/caliban-<name>/,
binary crates as first-class subdirectories of the workspace root
(caliban/, future caliban-tui/, caliban-orchestrator/) rather
than nested under a shared parent directory. Workspace members are
listed explicitly in root Cargo.toml, no globs.
Consequences
- Positive: root-level
lsreveals entry points (binaries) and config files.crates/reveals reusable libraries. Explicit member list catches typos and missing members at workspace-parse time. - Negative: new-crate workflow has two patterns rather than one
(
cargo new --lib crates/<name>for libraries,cargo new <name>at root for binaries). Documented in README. - Revisit if: the workspace stays small (<5 crates) and the
crates/directory feels like overhead, or grows past ~25 crates where a flat-but-grouped layout (e.g.crates/layer-1/,crates/layer-2/) becomes warranted.
ADR 0006 · Message schema → provider-neutral IR
- Status: accepted
- Date: 2026-05-22
Context
Layer 0 deferred the choice of message schema. Three approaches considered: (1) Anthropic-shape canonical; (2) provider-neutral IR; (3) lowest-common-denominator.
Decision
Define caliban's own Message/Content/StreamEvent types (the IR) in caliban-provider. Each adapter translates provider_native ↔ IR at its boundary. The IR is intentionally close to Anthropic's API shape because Anthropic's API is the most expressive of the supported providers; other adapters lose less information when mapping to the IR.
Consequences
- Positive: Adding a new provider doesn't touch
caliban-provider. Provider-specific API changes don't ripple. The model-router (Layer 3) operates uniformly on IR. All transport variants of a given schema family share IR conversion code. - Negative: One extra translation hop per request. IR design must capture the union of advanced features (thinking, prompt caching, multimodal) without becoming Anthropic-in-disguise.
- Revisit if: A provider emerges with feature semantics that can't be cleanly expressed in the IR (e.g., a new content modality the union doesn't anticipate).
ADR 0007 · Schema/transport factoring via Transport trait
- Status: accepted
- Date: 2026-05-22
Context
A naïve "one crate per concrete provider endpoint" plan duplicates the Anthropic Claude schema work across caliban-provider-anthropic (direct API), an eventual Bedrock-Claude crate, and an eventual Vertex-Claude crate. Two orthogonal dimensions exist: model schema family vs. transport/endpoint.
Decision
Each schema-family crate (caliban-provider-anthropic, caliban-provider-openai, caliban-provider-google, caliban-provider-ollama) defines its own Transport trait. A schema-family-generic XxxProvider<T: Transport> owns the IR conversion. Transport variants (DirectTransport, BedrockTransport, VertexTransport, AzureTransport, AIStudioTransport) are concrete Transport impls within their schema family, gated behind cargo features when they pull heavy deps (aws-sdk-bedrockruntime, gcp_auth).
Consequences
- Positive: Claude-on-Bedrock and Claude-on-Vertex reuse the Anthropic IR-conversion code. Adding a new transport for an existing schema is a single-file change. The model-router can treat
(schema_family, transport)as a tuple. - Negative: A Transport trait is per-family, not shared across families —
caliban-provider-anthropic::Transport ≠ caliban-provider-openai::Transport. This is intentional (transport contracts are not interchangeable across schemas). - Revisit if: A transport pattern emerges that genuinely cross-cuts schema families (e.g., a future caliban-side mTLS proxy that wraps any provider).
ADR 0008 · Role::System messages are positional (leading-only)
- Status: accepted
- Date: 2026-05-22
Context
OpenAI's API treats system as a role: system messages can appear anywhere in the messages array. Anthropic's, Gemini's, and Bedrock-Claude's APIs treat the system prompt as a separate top-level field. Modeling both shapes uniformly in the IR was an open question.
Decision
The IR has three roles: User, Assistant, System. System messages must appear contiguously at the start of CompletionRequest.messages. Validation rejects out-of-order System messages and System messages containing non-Text content blocks. Adapters with a separate-field system model (Anthropic, Gemini) collect the leading System messages and serialize them into the dedicated field; adapters with a system-role model (OpenAI, Ollama) pass them through as-is.
Consequences
- Positive: Single canonical representation. Maps cleanly to all four families. Per-System-message
cache_control(Anthropic feature) is preserved by serializing the system field as a block array when any block has a cache marker. - Negative: Disallows the rare pattern of mid-conversation system injection. Callers wanting that pattern must rewrite into a "User says: here's a new constraint…" style.
- Revisit if: A provider semantically requires non-leading system messages, or a credible agent design needs mid-conversation system injection.
ADR 0009 · Agent-core design (stream-as-primitive, sequential tools, opt-in compaction)
- Status: accepted
- Date: 2026-05-23
Context
Layer 1 / C adds the agent loop. Three design dimensions had real trade-offs: where the streaming surface lives, whether tool calls in one response are dispatched concurrently or sequentially, and what the default compaction strategy is.
Decision
stream_until_doneis the single source of truth. Non-streamingrun_turnandrun_until_doneare thin consumers of the stream. This means the streaming code path is always exercised; bugs surface through unit + integration tests of either surface.- Tool calls are dispatched sequentially within a single turn.
Anthropic and Gemini can emit multiple
tool_useblocks in one response; we run them in the order received. Parallelism is a follow-on (Hooks-pluggable dispatch strategy). - Default compactor is
NoopCompactor. Compaction strategies (DropOldest,Summarizing) are explicit opt-ins. The library doesn't silently mutate the user's message history; callers decide. - Retries only on the provider call. Tool failures don't retry —
tools manage their own retry semantics. Retryable provider errors:
RateLimit,Network,ServerError 502-599. NOT retryable:Auth,InvalidRequest,ContextTooLong,ContentFilter,Cancelled,Adapter,ModelUnavailable,ServerError 500.
Consequences
- Positive: Single source of truth → simpler correctness story. Sequential tool dispatch → predictable behavior, easier debugging. Opt-in compaction → no surprise history mutation. Retry policy classifier is conservative and stable.
- Negative: Sequential dispatch is slower than parallel for independent tools. Token-counting heuristic (chars/4) is approximate.
- Revisit if: Real workloads show sequential dispatch as a bottleneck (add parallel strategy); a non-English language is consistently mis-estimated (integrate a tokenizer crate).
ADR 0010 · WorkspaceRoot path resolution + opt-in restricted mode
- Status: accepted
- Date: 2026-05-23
Context
caliban's built-in tools (Read/Write/Edit/Bash/Glob/Grep) accept paths
from model-generated tool calls. Two extremes for path handling are
both wrong: (a) reject all absolute paths — breaks legitimate use cases
like reading /etc/hostname for diagnostics; (b) accept any path
unconditionally — lets a model accidentally read or overwrite arbitrary
files.
Decision
Tools share a WorkspaceRoot type that resolves relative paths against
a canonical root directory. Two modes:
- Permissive (default): Relative paths resolve under the root. Absolute paths are accepted as-is.
- Restricted (opt-in via
.restricted()): Resolved paths must start with the canonical root after canonicalization. Path traversal via..is normalized away before the prefix check, so escape attempts (../escape) are rejected withToolError::InvalidInput.
The CLI surface (Layer 4) chooses the mode; the default is permissive because caliban runs with the operator's permissions in their own environment. Restricted mode is intended for sandboxed-agent scenarios (future: agent-as-service, untrusted-task delegation).
Consequences
- Positive: Single shared resolver across all six tools; no
per-tool path-handling logic. Restricted mode provides a meaningful
safety boundary when needed.
..traversal attacks are defeated by canonicalize-then-prefix-check. - Negative: Permissive default means the model can read/write anywhere the harness process can. Acceptable for the personal-use context; documented as such.
- Revisit if: caliban gains a "delegated agent" mode where one caliban instance runs sub-tasks on behalf of another, requiring per-task sandboxing.
ADR 0011 · Sessions persisted to disk + interactive REPL
- Status: accepted
- Date: 2026-05-23
Context
caliban's MVP was single-shot — every invocation started a fresh conversation with no memory of previous runs. For real daily use, two things matter: (a) being able to resume a conversation across invocations, and (b) having an interactive prompt for iterative work without re-invoking the binary each turn.
Decision
Sessions: a PersistedSession (name, provider, model, messages,
total_usage, timestamps) saved as pretty-printed JSON under
$XDG_DATA_HOME/caliban/sessions/<name>.json (default
~/.local/share/caliban/sessions/). Names validated against
[a-zA-Z0-9_-]+ with length 1..=64 to prevent path traversal and
platform-incompatible names. Atomic writes via
tempfile::NamedTempFile::persist so crashes mid-save can't corrupt
the file.
REPL: caliban with no prompt + TTY stdin enters an interactive
loop using rustyline for line editing + history persistence at
~/.local/share/caliban/repl_history.txt. Slash commands (/help,
/exit, /quit, /clear, /sessions, /save, /usage) provide
session-management without exiting. When entered with --session,
the REPL auto-saves after every turn.
JSON over SQLite: chosen for transparency. Users can cat/edit/
diff session files; debugging is easy; no migrations. Tradeoff: O(n)
list and slower large-history loads, but until sessions exceed
thousands of turns this is irrelevant.
Consequences
- Positive: zero-friction resume of any past conversation. Sessions are inspectable / editable / git-trackable if a user wants. REPL gives an interactive UX without committing to a TUI.
- Negative:
rustylineadds non-trivial dependencies. Concurrent writes to the same session (two caliban processes) → last-write-wins (documented; out of scope for a single-user MVP). - Revisit if: session files grow large enough that JSON parse time is noticeable, or users want simultaneous multi-process access. Migration to SQLite would be straightforward — the SessionStore API is the abstraction boundary.
ADR 0012 · TUI via ratatui (replacing the rustyline REPL)
- Status: accepted
- Date: 2026-05-23
Context
caliban's first interactive mode was a rustyline-based REPL: a line editor with history and slash commands. It worked, but felt like a shell rather than a proper agent UI. The user asked for a Claude Code-like experience: dedicated input area, persistent status bar showing context (cwd, model, session), scrolling conversation transcript above.
Decision
Replace the rustyline REPL with a ratatui + crossterm-based TUI.
Three-region vertical layout:
- Output region — flex-grow; renders the conversation transcript
via
Paragraphwith wrap. Auto-scrolls to the bottom; PageUp/Down for history. - Status bar — fixed 1 line; shows
cwd · provider model · session (turns) · running…. - Input area — fixed 2 lines (border + line); plain text input with cursor + line editing + arrow-key history.
The event loop multiplexes terminal events (crossterm EventStream)
and agent stream events via tokio::select!. std::future::pending()
keeps the agent arm dormant when no turn is running.
Raw mode + alternate screen entered via a TerminalGuard RAII type
that restores terminal state on Drop (including panic-recovery).
Consequences
- Positive: Looks and feels like a modern agent CLI. Status bar gives immediate context (which session, which model, which dir). Streaming output renders in real-time above the prompt without interfering with input. ratatui handles terminal resize automatically.
- Negative: Significantly more code (~400 lines vs. rustyline's
~250). ratatui + crossterm add non-trivial deps. Markdown rendering,
mouse support, and customizable themes are deferred. Non-TTY
invocation without a prompt is now an error (use
--promptor pipe via-). - Revisit if: users want mouse interaction, syntax-highlighted code blocks in responses, or split-pane layouts (e.g., a side panel showing recent tool calls). Each would be a focused follow-on.
ADR 0013 · TUI overlays + layout v2 (input bracketed by horizontal rules)
- Status: accepted
- Date: 2026-05-23
Context
The first TUI iteration shipped a working three-region layout (output | status | input) and slash commands that wrote to the transcript. As the slash-command list grew (help, config, mcp, skills) the transcript became a cluttered place to render reference information.
Decision
-
Layout v2 reorders the regions so the input area sits between the output region and the status bar, bracketed by single-row horizontal rules. This puts the active input visually closer to the bottom (where the user's hands rest) and matches the Claude Code layout the user requested.
-
Overlays are modal popups rendered centered (80% × 80%) over the main view via ratatui's
Clear+ borderedBlock+Paragraphwidgets.ViewState::Overlay(Overlay)onApptracks which overlay is active;Escorqresets toViewState::Main. Main- view key handling is suppressed while an overlay is open (the overlay is read-only in v1). -
Four sub-menus:
/help(slash command + key reference),/config(active configuration fromapp.args/app.session),/mcp(stub pointing at future caliban-mcp-client),/skills(stub pointing at future caliban-skills).
Consequences
- Positive: Reference views don't pollute the transcript. The layout is closer to Claude Code's. The /config view is genuinely useful for verifying caliban's state at a glance. The /mcp and /skills stubs document the future direction in the UI itself.
- Negative: Two more enum variants per addition; overlay content is static for now and must be hand-edited when slash commands evolve. Editing config from the UI is deferred.
- Revisit if: A keyboard-driven command palette (Ctrl+P-style) is desired; if the slash-command list grows beyond ~12 entries and needs categorization; if /config gains edit capability (toggling bools, changing model mid-session) requiring stateful focus tracking.
ADR 0014 · Default system prompt + TUI stall fixes + debug logging
- Status: accepted
- Date: 2026-05-23
Context
Real-use testing revealed two issues with the daily-usable caliban:
-
No default system prompt — models had no context that they were running in caliban, what tools were available, or which directory they were operating in. Behavior was generic-assistant rather than harness-aware.
-
Occasional streaming stalls — the TUI's event-loop draws once per
tokio::select!iteration. Sometimes the loop appeared to hang: the transcript wouldn't update until the user pressed a key, at which point it would advance by one line. Input wouldn't echo during the stall.
Decision
System prompt
A caliban-cli/src/system_prompt.rs module builds a default prompt
auto-derived from current state (caliban identity, cwd, registered tool
names + descriptions, basic operating conventions). Resolves precedence:
--system "<text>"— literal override--system-file <PATH>— file content--no-system— no system prompt- (none) — default
All four are mutually exclusive via clap. The first three produce
Option<String>; the default returns Some(text).
Persistence rule: the system prompt is inserted as messages[0]
(Role::System) when a session is FIRST created. Loading an existing
session does NOT replace the prompt — the persisted system prompt is the
contract for that session. Switching models mid-session can produce a
mismatch (e.g., Claude-flavored prompt sent to a GPT model); this is
documented and considered acceptable. Users can edit the session JSON or
start a new session to refresh.
For ephemeral runs (no --session), the system prompt is prepended to
the message list at turn-construction time.
TUI streaming stall fix
Three belt-and-suspenders changes in the TUI event loop:
-
Tick interval at 50ms (20 Hz) added to the
tokio::select!. Even with no terminal or agent events, the loop iterates and redraws. This masks any missed-wakeup symptoms from either stream source. -
Explicit
stdout().flush()after eachterminal.draw(). Ratatui's backend should flush internally; this catches any platform-specific line-buffering edge cases. -
tokio::task::yield_now()between iterations. Ensures runtime fairness so neither the EventStream task nor the HTTP-streaming task can starve the loop.
If the underlying cause is something deeper (e.g., a missing waker in
async_stream::try_stream!), these fixes mask the symptom rather than
addressing the root cause. The debug log (below) will help identify
whether stalls recur.
Debug logging
--debug flag or CALIBAN_DEBUG=1 env var enables a
tracing-subscriber file appender writing to
<cache_dir>/caliban/debug.log. Logs each terminal event, agent stream
event, draw, and error. No overhead when disabled (the subscriber is
not installed).
Consequences
- Positive: Models now know their context. Stalls (if not eliminated)
are masked by tick-based redraws, and diagnostic data is available
for future investigation. System prompt is configurable per-invocation
and inspectable via
/systemoverlay. - Negative: 20 Hz tick = continuous redraws even when nothing changes. Ratatui's diffing keeps wire cost at zero, but CPU spends ~50ms-of-work-per-second on the diff. Acceptable for interactive UX. System prompt grows with tool count; will need summarization at MCP/ skills scale (future).
- Revisit if: Stalls recur with the tick in place — that indicates a deeper bug in the event-stream or agent-stream that we need to dig into using the debug logs. Or if profiling shows the 50ms tick is expensive (drop to 100ms or 200ms).
ADR 0015 · Context preservation + path conventions (~/dev fix)
- Status: accepted
- Date: 2026-05-23
Context
Real-use testing surfaced four issues bundled into one fix:
- The TUI's ephemeral REPL (no
--session) silently dropped every turn'sfinal_messages, so each new prompt only saw the system prompt + the latest user message. Models had no memory of prior turns in the same REPL session. WorkspaceRoot::resolvedidn't expand~. When models invokedBashwithcwd: "~/dev"orRead({"path":"~/notes.md"})the path resolution failed with "No such file or directory." The model misinterpreted the error as "directory doesn't exist."- The TUI's tool-call input summary truncated the partial-JSON stream at 80 chars, sometimes hiding closing braces and making patterns look different than they were.
- The default system prompt didn't tell the model that
~is supported in tool paths.
Decision
- Add
messages: Vec<Message>to the TUI'sApp. Initialize from session if any, else empty. Update fromRunEnd'sfinal_messageseach turn./clearwipes both the in-memory history and the session's persisted messages. WorkspaceRoot::resolveexpands a leading~or~/todirs::home_dir(). Affects all path arguments to all tools. TheBashcommand string is unchanged — the shell handles~expansion there.- At
ToolCallEnd, parse the accumulated input as JSON and renderkey="value", key=valuepairs. Fall back to raw truncation on parse failure. - Add a path-conventions bullet to the default system prompt.
Consequences
- Positive: Ephemeral REPL now feels like a real conversation
rather than a series of disconnected one-shots.
~/foopaths work transparently. Tool-call summaries are readable. The system prompt's conventions are accurate. - Negative:
App::messagesandsession.messagesare now two copies in--sessionmode (kept in sync atRunEnd)./clearis destructive to session-stored messages — documented. - Revisit if: The double-keeping causes correctness bugs (e.g.,
divergence after a mid-flight panic). The cleanest long-term
refactor would be to make
Apphold anArc<RwLock<Session>>and treat session as the single source of truth, with the ephemeral case using a synthetic in-memory session.
ADR 0016 · Parallel tool dispatch (supersedes ADR 0009 §"sequential tools")
- Status: accepted
- Date: 2026-05-23
- Supersedes: ADR 0009 (in part — sequential tool dispatch only)
Context
ADR 0009 chose sequential tool dispatch within a single assistant turn
as a v1 simplification: "Parallelism is a follow-on (Hooks-pluggable
dispatch strategy)." Real workloads bore out the cost. Models routinely
emit 2–6 tool_use blocks per turn (parallel Greps + Reads while
exploring a codebase, repeated WebFetches to compare sources), and the
serial loop paid the sum of their wall-clock latencies rather than the
max. The follow-on landed on jf/feat/parallel-tools in commits
b624110 → 4751746 → b5fba58.
This ADR records the resulting architectural commitment.
Decision
- Parallel tool dispatch is default-on.
AgentBuilderinitializesparallel_tools: true. Operator opt-out via--no-parallel-tools/CALIBAN_NO_PARALLEL_TOOLS=1falls through the same code path withpermits = 1, preserving serial semantics without a separate branch. - Bounded concurrency via an
Arc<tokio::sync::Semaphore>. The default cap isavailable_parallelism().get().saturating_sub(1).max(1)— leave one core for the agent loop, streaming, and the TUI render thread. Tools are mostly I/O-bound, so this is a soft ceiling against runaway fan-out rather than a hard CPU bound. Operator override:--parallel-tool-limit N/CALIBAN_PARALLEL_TOOL_LIMIT=N. before_toolhooks run serially. The hook is the synchronization point for permissions, auditing, andDenyshort-circuiting. The serial gate produces aVec<DispatchPlan>ofAllowed/Deniedentries; onlyAllowedentries fan out to aFuturesUnordered.Deniedresults are yielded first, in assistant-message order, so the TUI sees deny notices before any in-flight tool resolves.Tool::invoke()runs concurrently forAllowedplans. Results arrive in completion order on the event stream (best TUI liveness) and are then reordered back into assistant-message order when appended to the persistedtool_result_blocksso history and replay remain deterministic.- Cancellation propagates through the shared
tokio_util::sync::CancellationToken. A cancel at any point aborts all in-flight tools; partial results are dropped. - Per-tool
is_parallel_safe()flag is deferred. All current built-ins are independent:Bashspawns fresh subprocesses;Read/Grep/Globare pure-read;Edit/Writetouch files but the model rarely emits overlapping writes on the same path. YAGNI — add the flag if write contention is observed in practice (e.g. twoEditcalls on the same file in one turn).
Rationale
The semaphore-bounded FuturesUnordered pattern keeps the agent loop
single-threaded while extracting most of the available parallelism from
the model's batching. The serial before_tool gate keeps the existing
hook contract intact — permission systems don't have to reason about
race conditions across concurrent tool calls. Streaming ToolCallEnd
events in completion order means the TUI shows whichever tool finishes
first immediately, instead of waiting for the slowest one in batch
order.
Consequences
- Positive. Multi-tool turns clear in roughly
max(t_i)rather thansum(t_i).parallel_tools=falsestill works as an opt-out for users who want strict deterministic ordering in the event stream (e.g. for snapshot testing). - Negative. Tracing output interleaves across tools within a turn;
log readers need to follow
tool_use_idto reconstruct per-tool sequences. The newcaliban::toolstracing event surfaces dispatched/denied counts and total wall time per turn so theperf-baselinenumbers stay legible. - ADR 0009's "sequential tools" guidance is superseded. The rest of ADR 0009 — stream-as-primitive, opt-in compaction, conservative retry classifier — remains in force.
- Sub-agent primitive (forward link to
0021-sub-agent-primitive.mdwhen written) inherits this dispatch model: each sub-agent runs its own bounded parallel loop, and the parent agent's semaphore is independent of the child's. - Revisit if: write contention surfaces in real use (add
is_parallel_safe()and a per-tool exclusion policy), or if profiling shows the semaphore itself is a contention point at high concurrency (unlikely; tokio'sSemaphoreis fair and cheap).
References
- Design spec:
docs/superpowers/specs/2026-05-23-parallel-tools-design.md - Commits:
b624110(design),6b71a6c(plan),4751746(builder fields),b5fba58(FuturesUnordered + Semaphore refactor) - Implementation:
crates/caliban-agent-core/src/agent.rs(parallel_tools/parallel_tool_limitfields),crates/caliban-agent-core/src/stream/parallel.rs(three-phase dispatch)
Revised 2026-05-26
The original Decision deferred a per-tool is_parallel_safe() flag,
noting that no built-in had write contention. That observation was
true in 2024 (Bash / Read / Grep / Glob). It is no longer true: ADRs
0028 + 0035 introduced Edit / Write / MultiEdit / NotebookEdit /
WriteMemoryTopic, all of which can collide on the same target within
one turn.
Revised mechanism: parallel_conflict_key(&self, input) -> Option<String> on the Tool trait. Returns None for fully
parallel-safe tools (the default; matches the original 2024 posture).
Returns a conflict-identity string for tools whose effect is keyed to
a target — typically the canonicalized path for filesystem writes;
for WriteMemoryTopic, a memory:{type}:{name} string. The dispatcher
builds a per-key tokio::sync::Mutex map and each tool's dispatch
future awaits its key's mutex (FIFO) before acquiring the
parallel_tool_limit semaphore. Same-key calls serialize in
submission order; different-key calls and None-key calls parallelize.
What this preserves. Read / Grep / Glob / Bash continue to behave
exactly as before (default None). Two Edits on different files
still parallelize. The parallel-tools differentiator from Claude Code
is intact.
What this fixes. Two Edits on the same file (whether via the
same path string, a ./-prefixed variant, or a symlink that
canonicalizes to the same inode) now serialize in submission order
rather than interleaving non-deterministically.
Per-tool overrides shipped: Edit, Write, MultiEdit,
NotebookEdit all key on the canonicalized path
(crates/caliban-tools-builtin/src/parallel.rs::canonical_key).
WriteMemoryTopic keys on memory:{type}:{name}.
Tests: crates/caliban-agent-core/tests/parallel_conflict_key.rs
covers distinct-key parallelism, same-key serialization,
keyed + plain mixing, and shared-key + independent triples.
ADR 0017 · MCP client architecture
- Status: accepted
- Date: 2026-05-23
Context
caliban's /mcp overlay is a stub (see ADR 0013). Adding real MCP
(Model Context Protocol) client support is priority #1 on the
post-WebFetch roadmap: it unlocks the long tail of integrations
(Linear, Notion, Slack, in-house servers) without needing a built-in
tool per service. The full implementation spec lives at
docs/superpowers/specs/2026-05-23-mcp-client-design.md; this ADR
records the architectural commitments only.
Decision
Transport: stdio in v1; SSE + StreamableHTTP deferred
v1 ships stdio transport only. Each configured server is launched
as a child process; JSON-RPC frames travel over its stdin/stdout. SSE
and StreamableHTTP transports are non-trivial separate deps
(reqwest-eventsource, hyper streaming) and gate on real-world
demand. They land in v2.
SDK: rmcp (official Rust SDK)
We adopt the rmcp crate (the official Rust MCP SDK published by the
Model Context Protocol org) over the community mcp-client crate.
Rationale: official maintenance, broader trait coverage (Client +
Server + transports), and a working transport::child_process module
we'd otherwise reimplement. Pinned to rmcp = "0.x" (latest released
line at adoption time); workspace-pinned to keep upgrades atomic
across our crates.
Auth: env-var only in v1; OAuth deferred
v1 supports passing secrets to MCP servers via the env table in the
server-config TOML (with optional ${VAR} expansion from the
operator's environment). Per-server OAuth — the protocol's full
authentication story for hosted MCP servers — is deferred to v2 and
will land alongside SSE/HTTP transports, where it's actually
relevant. Stdio servers overwhelmingly authenticate via env vars
today.
Tools surface as Box<dyn Tool> in the existing registry
MCP-discovered tools wrap in an McpTool struct that implements the
caliban_agent_core::Tool trait and registers in the same
ToolRegistry as built-ins. Naming convention:
mcp__<server>__<tool> (double underscores) — mirrors Claude Code so
operators recognize the surface. <server> is the config-file table
name; <tool> is the server-advertised tool name; both are
ASCII-snake-case-normalized at registration so names match what the
provider's tool-use API accepts.
Tool::input_schema() returns the schema the server advertised, with
no rewriting. Tool::invoke() proxies via rmcp and translates the
response into caliban ContentBlocks. Hooks (before_tool /
after_tool) fire for MCP tools exactly as they do for built-ins —
no special case — which means existing permission UX, audit logging,
and deny-rules cover MCP automatically.
Server config file
Two TOML files, merged at startup with project overriding user:
~/.config/caliban/mcp.toml(per-user; XDG-aware on Linux, cache_dir on macOS).caliban/mcp.toml(per-project, relative to cwd; optional)
Schema is fully specified in the design doc. Project-level config can
disable a user-level server by setting disabled = true for the same
name. Config-file location and merge semantics will be revisited
when the broader .caliban/ config story lands (separate spec); the
MCP spec is the prior art that pattern will follow.
Discovery: best-effort at startup
At caliban startup, for each non-disabled server entry: spawn the
child process, send initialize, list tools, and register an
McpTool per advertised tool. A failure (spawn fails, handshake
times out, server reports an error) logs a warning and continues
— it does not abort startup. The TUI's /mcp overlay surfaces
per-server status (connected / failed / disabled) so the operator
can see what's missing without watching stderr.
Tools are not re-discovered after startup in v1; if a server adds a
tool mid-session, the user restarts caliban. Server-push
notifications (notifications/tools/list_changed) are deferred.
Lifecycle: session-scoped; cleanup on exit
Servers run for the duration of the caliban session. On shutdown
(clean exit, Ctrl-C, panic) the McpClientManager's Drop
sends notifications/cancelled to each server and drops its
tokio::process::Child (configured with kill_on_drop(true)), so
no servers leak even on unclean exit.
Consequences
- Positive: Unblocks integration with the dozens of stdio MCP servers already published. Tool surface is uniform — same trait, same registry, same hooks — so the agent and TUI need no MCP- specific code paths after registration. Stdio-first keeps the initial dep surface small.
- Negative: SSE/HTTP servers aren't reachable in v1; operators
who want a hosted MCP server have to wait for v2 or wrap it in a
local stdio proxy. The
mcp__<server>__<tool>name shape is long and noisy in transcripts; acceptable for parity with Claude Code. Each server is one extra child process — RAM and FD overhead is per-server, not amortized. - Revisit if: Real demand emerges for hosted (SSE/HTTP) servers —
promote the v2 work earlier. If
rmcp's release cadence lags protocol changes, evaluatemcp-client. If tool-name collisions become common (two servers exposing a tool with the same short name), themcp__<server>__prefix already handles it, but UX may want a friendly alias mechanism.
ADR 0018 · Memory tier model (CLAUDE.md ingestion + auto-memory)
- Status: accepted
- Date: 2026-05-23
Context
caliban has no persistent memory across sessions. The default system
prompt is rebuilt from cwd + tool list each invocation (ADR 0014), so
operator preferences, project conventions, and learned facts about
the user have to be re-supplied by hand every time. Claude Code
solves this with a CLAUDE.md mechanism plus an auto-memory tier
the agent can write to via its existing file tools. The user's own
~/.claude/CLAUDE.md already exercises this pattern; that mental
model is the target.
Decision
caliban adopts a three-tier memory model, all of which live on disk as plain Markdown and are read at session start. A fourth MCP-mediated tier slots in later (forward link only; not in this ADR).
Tier 1 — Global
- Path:
~/.config/caliban/CLAUDE.md(XDG$XDG_CONFIG_HOMEhonored). - Owner: the operator. caliban never writes here.
- Contents: cross-project preferences (tool choice, style, persona).
- Read once at startup, optional (missing file is fine).
Tier 2 — Project
- Path:
<workspace_root>/CLAUDE.mdwhereworkspace_rootisWorkspaceRoot::root()(ADR 0010). - Owner: the project / repo. caliban never writes here (operators commit it like any other file).
- Contents: repo-specific conventions, build commands, taboos.
- Read once at startup, optional.
Tier 3 — Auto-memory
- Directory:
~/.local/share/caliban/projects/<sanitized-cwd>/memory/(XDG$XDG_DATA_HOMEhonored). Sanitization replaces/with-and drops the leading dash, so/Users/jf/dev/calibanbecomesUsers-jf-dev-caliban. - Files: one
MEMORY.md(index, ≤ 200 lines) plus arbitrary<slug>.mdtopic pages. - Owner: the agent. Writes go through the existing
Write/Edittools — no special memory tool, no separate trust path. - Only
MEMORY.mdis loaded eagerly. Topic pages are lazily fetched by the agent viaReadwhen the index points it at one.
Composition
All three tiers are concatenated into the system prompt above the auto-generated default (cwd + tool list + conventions, per ADR 0014). Order: global → project → auto-memory index. Each tier is wrapped in explicit delimiters so the model can tell them apart:
<global-claude-md path="…/CLAUDE.md">…</global-claude-md>
<project-claude-md path="…/CLAUDE.md">…</project-claude-md>
<auto-memory-index path="…/MEMORY.md">…</auto-memory-index>
<default system prompt body from system_prompt::build_default …>
Missing tiers are simply omitted (no empty tag block).
Token budget
The combined memory prefix is capped at 8 000 tokens (estimated as
chars / 4 — provider-agnostic and cheap). If the combined size
exceeds the cap, auto-memory is truncated first (with a
[truncated: N bytes] notice appended to its block), then project,
then global. Hitting the global cap is treated as operator error
(loud tracing::warn! plus the truncation marker in the prompt).
Retrieval
None in v1. Memory IS the system prompt prefix. Semantic search
over memory (RAG) is a v2 concern and would slot in as a new tool
(MemorySearch), not as a change to how memory is loaded.
Forward links
- MCP memory tier. Once MCP support ships, an MCP server like the user's SilverBullet integration plugs in as Tier 4: not eagerly loaded, accessed on demand via MCP tool calls. The precondition-check pattern from the user's own CLAUDE.md ("skip if MCP is absent") applies.
/memoryslash command. Shows active tiers + paths + sizes; offers$EDITORopen for the global and project files. Detailed in the spec.
Consequences
- Positive. Matches the user's existing mental model exactly, zero learning curve. Agent maintains its own knowledge using the same Read/Write/Edit it already has — no special memory tool to audit, sandbox, or rate-limit. MCP tier slots in cleanly without reshaping the loader.
- Negative. 8K tokens is real cost on every turn (Anthropic prompt caching recoups most of it). Agent can clutter auto-memory if write conventions aren't well-specified (the spec pins them down). No drift detection between project CLAUDE.md and what the agent "remembers" — by design; the project file wins by splice order, but contradicting auto-memory will sit side by side.
- Revisit if: the 8K cap starts triggering routinely (raise it,
or add summarization); auto-memory becomes a write-only graveyard
(add a v2
MemorySearchtool and stop loading the full index); per-project agent memory grows past what's reasonable to grep (move to SQLite, but keep markdown export).
Crate
New crate caliban-memory owns tier discovery, sanitization, file IO,
splicing, and budget enforcement. caliban-agent-core does not
take a dep on it — the binary (caliban/src/main.rs) calls the
memory crate at startup and passes the assembled string to
system_prompt::resolve as a prefix.
Revised 2026-05-26
Bumped the combined-prefix default from 8,000 to 32,000 tokens. The 8,000-token default was conservative against 2024 context windows and was increasingly punishing in 2026 (1M-token Sonnet, 200K standard on most providers). Truncation-first behavior was at risk of dropping the auto-memory index — exactly the tier that grows.
Added per-scope token caps via three optional [memory] settings keys
(all integer, default unset):
cap_tokens_auto— caps the auto-memory tier independently.cap_tokens_claude_md— caps the combined CLAUDE.md tier (global + project). When binding, truncates project first, then global.cap_tokens_combined— overrides the combined ceiling (max_tokens).
When the sum of both per-scope caps would exceed cap_tokens_combined,
each is scaled down proportionally rather than silently dropping a
tier. Settings.json values override the corresponding env vars
(CALIBAN_MEMORY_BUDGET_TOKENS, CALIBAN_MEMORY_CAP_TOKENS_AUTO,
CALIBAN_MEMORY_CAP_TOKENS_CLAUDE_MD) when both are present.
Truncation order within a tier is unchanged from the original Decision.
ADR 0019 · Skills loading
- Status: accepted
- Date: 2026-05-23
Context
The /skills overlay in the TUI is currently a stub (see ADR 0013).
Skills are priority #4 on the post-WebFetch roadmap: they let the
operator drop in reusable instruction-and-procedure packages — Claude
Code's "superpowers" model — without recompiling caliban or shipping
prompts in-crate. The full implementation spec lives at
docs/superpowers/specs/2026-05-23-skills-design.md; this ADR records
the architectural commitments only.
Decision
Skills are file-based, frontmatter-keyed
A skill is a directory <skill-name>/SKILL.md. The file is YAML
frontmatter followed by a markdown body:
---
name: brainstorming
description: "You MUST use this before any creative work ..."
metadata:
trigger: pre-implementation
---
# Brainstorming Ideas Into Designs
...
name and description are required; metadata.* is a free-form map
the loader passes through unchanged. The body is the model-facing
instruction set — no execution, no scripts auto-run, no sandbox. This
format matches the superpowers plugin so existing skills can be
copied in unchanged.
Skills surface as a single built-in Skill tool
A built-in Skill tool with invoke({"name": "<x>"}) loads
<x>/SKILL.md and returns its body as a ContentBlock::Text. Loaded
skills are NOT registered individually — that would explode the
tool-use schema. The Skill tool's description carries a bulleted
<name>: <description> list of every loaded skill, so the model
knows the menu and can call Skill with the right name.
Skills are NOT auto-loaded into the system prompt
Loading every body upfront burns thousands of tokens per turn at any
nontrivial skill count. Only description lines hit the prompt
(via the tool description above); bodies load on-demand.
Discovery locations (priority order)
<workspace_root>/.caliban/skills/— project-pinned skills~/.config/caliban/skills/— per-user skills~/.local/share/caliban/plugins/*/skills/— global plugin dir, mirrors how Claude Code resolves plugin skills
A skill in an earlier location shadows a later one with the same
name. Paths are XDG-aware on Linux and use cache_dir/data_dir
analogues on macOS, matching the MCP config conventions in ADR 0017.
No skill execution sandbox
Skills are text injected into the model's context. They are not
executable code. The scripts/ and references/ subdirectories that
appear in some Claude Code skills are loadable only by the model
through existing Read / Bash tools — caliban itself does not
execute anything skill-side. This keeps the trust model identical to
"the operator wrote this file."
New crate: caliban-skills
Skills logic lives in a new workspace crate crates/caliban-skills/
exporting SkillLoader, Skill, and SkillTool. It depends on
caliban-agent-core (for the Tool trait), serde + serde_yaml
for the frontmatter, ignore (already in the workspace) for
directory walking, and thiserror. The caliban binary constructs
one SkillTool at startup, registers it with ToolRegistry, and
wires the loaded skills into the /skills overlay.
Consequences
- Positive: Existing superpowers-format skills port with zero changes. Token cost stays bounded — only descriptions hit the prompt; bodies are pay-per-use. Skills are uniform with every other tool (same registry, same hooks, same audit log).
- Negative: The
Skilltool's description grows with skill count; ~50 skills crowds the schema budget (truncation policy is spec-level concern). Frontmatter parse failures are per-file warnings — silent skill loss if the operator doesn't watch logs. No versioning: an update is a directory-replace. - Revisit if: Description-list growth crowds the schema —
consider a two-tier surface (frequent inline, rare via a
ListSkillstool). If operators want bundled defaults, add an opt-in--with-default-skillsflag (currently a non-goal).
ADR 0020 · Permission rules layered on top of Hooks
- Status: accepted
- Date: 2026-05-23
Context
caliban currently has no permission model. The Hooks::before_tool
extension point can already short-circuit a tool call with a
HookDecision::Deny(msg), but nothing in the tree consults rules,
prompts the operator, or enforces a default policy. As we add more
"dangerous" tools (BashTool already executes arbitrary shell;
WriteTool, EditTool, WebFetch, future MCP tools), we need a
rule-based gate that matches the operator-facing UX of Claude Code
without inheriting its classifier complexity.
Decision
Implementation site
Permissions are a layer on top of the existing Hooks trait — not
a parallel system. We add a PermissionsHook that implements
Hooks::before_tool and consults a rule database. Composition with
other hooks (observability, debug logging) is handled by a small
CompositeHooks adapter; permissions just plug in as one entry.
Rule schema
Each rule has three fields:
tool— pattern string (glob-style; see Pattern matching).action—Allow|Deny|Ask.comment— optional free-text shown in the TUI prompt.
Rule sources (priority high → low)
- CLI flags
--allow <PAT>,--deny <PAT>,--ask <PAT>(one-shot, repeatable). - Project file
<workspace>/.caliban/permissions.toml. - User file
~/.config/caliban/permissions.toml. - Built-in defaults (read-only tools
Allow; everything elseAsk).
Higher-priority rules shadow lower-priority ones. Within a single source, first match wins, so users place narrow rules above the catch-all.
Pattern matching
Glob-style on tool_name plus an optional :<first-arg-prefix> suffix.
Bash— bare tool name; matches any input.Bash:git *— bash whosecommandfield starts withgit.Bash:*— equivalent toBash(explicit wildcard).*— matches every tool.
The "first arg" is tool-defined: for Bash it's the command field;
for WebFetch it's url; for Read/Edit/Write it's path. Tools
that don't declare a first-arg field are matched on tool name only.
Prefix-after-colon uses simple glob (*, ?) on the stringified first
arg, not full regex — keeps the rule format inspectable.
Ask action
Ask requires an interactive UI. The TUI provides a modal prompt
(allow once, allow permanently, deny once, deny permanently). In
non-interactive sessions (no TTY, no --auto-allow), Ask degrades to
Deny with a clear log message. --auto-allow is the documented
"escape hatch" for non-interactive runs and is loud about being
dangerous.
Consequences
- Positive: mirrors the Claude Code rule format operators already
know, without copying the classifier-heavy approach
(
bashClassifier/yoloClassifier). Reuses the existingHookscontract — zero new core traits. Project + user files allow shared team policies committed to source control. - Negative: glob matching on first-arg-prefix can be surprising
(e.g.
Bash:rm *does not matchBash:sudo rm *). Acceptable; the TUI prompt shows the rule that matched so users can see why a call was allowed/denied. Shadowed-rule warnings are deferred. - Revisit if: prefix matching proves insufficient for real-world
bash commands and operators are routinely surprised by
Allow/Denyoutcomes. Next step would be a classifier (LLM-graded command-intent), but we want concrete evidence before going there.
ADR 0021 · Sub-agent primitive via AgentTool
- Status: accepted
- Date: 2026-05-23
Context
caliban's turn loop is a single agent calling tools. Several real-use patterns benefit from a sub-agent primitive: parallel search over a large codebase without polluting the parent's context, subtasks with a restricted tool palette, or delegating multi-step investigations whose intermediate steps shouldn't bloat the parent transcript.
Claude Code has two related primitives — synchronous Agent (a tool)
and Task (async background runs you poll). We need the synchronous
one. Async Task is a separate, larger piece of work.
Decision
Surface: a tool, not a new core type
Sub-agents are spawned by the model invoking a built-in tool
AgentTool. Input: {prompt, tool_allowlist?, model?}. Output: one
ContentBlock::Text containing the sub-agent's final assistant text
(truncated to ~5000 chars).
In-process, not child-process
The sub-agent runs an entire turn loop on its own Agent instance in
the same tokio runtime. Single binary, single runtime — cancellation
and tracing stay unified. Sub-agent shares the parent's Provider
instance, inheriting HTTP/2 multiplexing, the connection pool, and
Anthropic-side prompt cache locality. No IPC, no serialization. The
cost is no OS-level isolation, which is acceptable: the existing trust
model (operator already runs BashTool-capable code) doesn't gain
much from a child process.
Construction via factory
AgentTool::new(factory: Arc<dyn Fn(&AgentToolInput) -> Agent + Send + Sync>).
The factory is wired from main and closes over the parent's
provider, tool registry, and hooks. Each invocation builds a fresh
Agent with the parent's provider; model from input (or parent's);
a ToolRegistry filtered by tool_allowlist; and max_turns = 20
(operator-tunable in code, not from model input).
Tool allowlist semantics
tool_allowlist: ["Read", "Grep"]→ sub-agent gets exactly those. Unknown names are silently dropped.tool_allowlist: nullor omitted → sub-agent inherits every parent tool EXCEPTAgentToolitself.
No recursion in v1: AgentTool is filtered out of every sub-agent's
registry. Nested sub-agents are a v2 problem (depth limits, fan-out,
cost ceilings).
Budgets
max_turns = 20 (hard). Sub-agent inherits the parent's max_tokens.
No per-call cost ceiling because we don't have a router yet; add
max_cost_usd later.
Transcript representation
Parent transcript gets the ToolUseBlock (name = "AgentTool", input
JSON) and a ToolResultBlock containing the sub-agent's final
assistant text (truncated to ~5000 chars). Intermediate sub-turns are
not persisted in the parent session — they live only in the
sub-agent's transient buffer. Debug logs capture the full trace.
Not a Task primitive
Claude Code's Task is async-with-lifecycle (spawn, poll, cancel,
retrieve). AgentTool::call is synchronous: the parent's turn loop
blocks on the sub-agent's loop completing. Async Task is v2.
Consequences
- Positive: unlocks the "parallel exploration without context
bloat" pattern; reuses every existing primitive (
Agent,Hooks,ToolRegistry,CancellationToken). Permissions apply to the sub-agent's tools just like the parent's, because the sub-agent'sAgentis built with the same hooks chain. - Negative: synchronous-only — if a sub-agent loop takes minutes,
the parent appears stuck. Mitigation: sub-agent stream events bubble
to the TUI via the parent's stream so the operator still sees
progress. Token accounting at the parent level shows sub-agent usage
as a single line (the
ToolUseBlock); cost attribution to specific sub-turns lives only in the debug log. - Revisit if: users routinely want to dispatch many sub-agents in
parallel — at that point we promote
AgentToolfrom synchronous to the v2Taskprimitive and add lifecycle management.
ADR 0022 · Model routing architecture
- Status: accepted
- Date: 2026-05-23
Context
The agent makes provider calls for several distinct purposes — the main
conversational loop, summarization for compaction, embeddings for memory,
fast classification for routing decisions, sub-agent loops, etc. Today
those all run through the single Arc<dyn Provider> handed to the
Agent. Operators who want to use Sonnet for the main loop, Haiku for
summarization, and a local Ollama model for fast classification have no
clean way to express that.
Claude Code solves this with hardcoded getMainLoopModel /
getSmallFastModel helpers. That's fine for a single-vendor harness;
it's wrong for caliban, which is provider-agnostic by design. Operators
should be able to compose any model from any provider for any purpose
without recompiling.
A model router also turns out to be the natural home for several already-deferred concerns: per-route fallback chains, hedged requests, circuit breakers, cost/usage aggregation, and unification of the divergent prompt-cache surfaces across Anthropic, OpenAI, and Gemini.
This is signature differentiation for caliban; it deserves its own layer.
Decision
- Add a new Layer-3 crate
caliban-model-router. It sits betweencaliban-agent-coreand the fourcaliban-provider-*adapter crates. No agent-core code changes shape; the agent continues to take a singleArc<dyn Provider>. - The router IS a
Provider. It implements the same trait the adapters implement, so the agent sees one provider — the router — and the router internally dispatches eachcomplete/streamcall to the right downstreamProvider+ model based on the request's purpose, the operator's policy, and the capabilities the request needs. - Routes are matched by
RequestMetadata.purpose. A new field on the existingRequestMetadatastruct:purpose: Option<RequestPurpose>with variantsMainLoop | Summarization | Embedding | FastClassifier | SubAgent | Custom(String). Callers that don't set a purpose route through a default configured by the operator (likelyMainLoop). - Routing policy is operator-defined. A TOML config file plus a builder API. No auto-learning, no automatic cost optimization, no hidden behavior. The operator owns the cost / latency / capability trade-offs explicitly. This is a deliberate differentiator from Claude Code's hardcoded paths.
- Capability filtering is mandatory. Each route declares its
provider + model; the router consults
Provider::capabilities(model)before dispatch and skips a route whose capabilities don't satisfy the request (e.g. request needsToolUseCapability::ParallelCallsbut the route's model only supportsBasic). - Per-route fallback is opt-in and ordered. When the same
purposeappears in multiple[[route]]entries, the entries form a fallback chain in declaration order. The router tries them in sequence on a retryable failure of the previous entry (rate-limit, model unavailable, transient network error). Implementation is deferred to v2 — this ADR commits to the design. - Cost / usage aggregation is a router responsibility. The router
sees every call and every
Usage. It maintains a per-(provider, model)accumulator and exposes aRouterStatssnapshot for the TUI's existing/usageoverlay (ADR 0013) to render. - Hedging and circuit-breakers are router responsibilities. Both are sketched in the design spec but deferred to v2.
Consequences
- Agent constructor unchanged.
AgentBuilder::provider(...)takes the router as itsArc<dyn Provider>exactly like any adapter. No code incaliban-agent-coreknows the router exists. - Adapters stay simple. Per-adapter retry policy (existing
RetryPolicyfor transient errors) remains in the adapter. The router handles route-level fallback. The two layers compose: adapter retries within a route; router moves to the next route only if the adapter exhausts its retries with a fatal-for-this-route error. - Prompt-cache unification lands here. Anthropic's
cache_controlmarkers, OpenAI'scache_read_input_tokens, and Gemini's context-caching all surface as the sameUsage.cache_read_input_tokens/cache_creation_input_tokensvalues once they reach the router; the router is the natural place to normalize the bookkeeping. before_turnhook needs a way to see the resolved route. The agent'sTurnCtxcurrently exposesconfig.model, which is the caller's request, not the route's actual choice. A new optional field (or a router-supplied hook surface) is required so the TUI status line can display "Sonnet via Anthropic, fallback gpt-4o" instead of just the requested logical name. Detailed in the spec.- Sessions become route-history-aware. If a session was started on route A and resumes on route B (because the config changed, or the primary route is unavailable), prompt-cache markers from the prior provider are inert. The router documents this and falls back to no-cache for the transition turn.
- Forward links: hedged requests, circuit breakers, and adaptive
retry budgets were listed as non-goals in
2026-05-23-perf-baseline-design.md. This ADR pulls them under the router's umbrella for v2. - Revisit if: the operator-defined policy turns out to be a meaningful UX burden in practice (consider a "balanced" default policy), or if hedged requests prove valuable enough to promote from v2 to v1.
References
- Design spec:
docs/superpowers/specs/2026-05-23-model-router-design.md - Provider trait:
crates/caliban-provider/src/lib.rs - Capabilities:
crates/caliban-provider/src/capabilities.rs - Per-adapter retry: ADR 0009 (RetryPolicy)
- Usage overlay: ADR 0013 (TUI overlays)
- Perf-baseline non-goals:
docs/superpowers/specs/2026-05-23-perf-baseline-design.md
ADR 0023 · MCP v2 — transports, OAuth, elicitation, resources
- Status: accepted
- Date: 2026-05-24
- Spec:
docs/superpowers/specs/2026-05-24-mcp-v2-design.md - Supersedes scope of: ADR 0017 deferred items
Context
ADR 0017 shipped caliban's MCP client as a config-only scaffold:
McpClientManager::start is a no-op, McpTool::invoke is unwritten,
and the only working pieces are TOML parsing and server-name
validation. Closing the gap to Claude Code requires (a) actually
wiring rmcp so stdio servers spawn and discover tools, and (b)
adding HTTP/SSE transports + OAuth + elicitation + resources.
Decision
Phased delivery — three sub-PRs
v2 ships in three independently-mergeable phases:
- Phase A — stdio wiring. Implement
Conn::startfor stdio andMcpTool::invoke. In-tree test server. Closes the deferred "rmcp wiring" follow-up from ADR 0017. - Phase B — HTTP + SSE transports. Adds
Transport::HttpandTransport::Sseover the correspondingrmcptransport modules.oauth = "off"only at this phase — for self-hosted endpoints behind a fixed bearer or no auth. - Phase C — OAuth + elicitation + resources.
McpOAuthFlow(PKCE- loopback callback +
keyringtoken storage),ElicitationBridge(TUI modal + non-interactive auto-decline),McpResource(@server:resourceautocomplete and inline read).
- loopback callback +
Each phase ticks rows in docs/parity-gap-matrix.md from 🔴 → ✅ in the
PR that lands it.
Transport selection is a config field, not separate crates
ServerConfig.transport: "stdio" | "http" | "sse" (default "stdio")
selects which rmcp transport constructor to call. The manager is
otherwise transport-agnostic — Conn exposes the same
rmcp::client::RunningService<…> regardless of transport. This keeps
the agent-side code path uniform: Hooks, dispatch, cancellation,
and serialization see no MCP-transport details.
OAuth uses PKCE + a loopback callback on a random port
Hosted MCP servers behind OAuth use the authorization-code flow with
PKCE (S256). caliban spawns a short-lived axum server on
127.0.0.1:0, prints the auth URL, captures the callback, and
exchanges the code for tokens. Tokens persist in the OS keyring
(keyring crate); fallback to $XDG_DATA_HOME/caliban/mcp-tokens.json
mode 0600 on systems without keychain support. --mcp-oauth-port and
CALIBAN_MCP_OAUTH_PORT override the random port for firewalled
machines.
We pick PKCE + loopback over device-code or out-of-band paste because it's what Claude Code uses and what RFC 8252 recommends for native clients. A v2.1 follow-up may add a paste-back fallback if real demand emerges from operators on hardened networks.
Elicitation is a side-channel, not a tool
ElicitationBridge is a separate caliban-side type with its own mpsc
queue; it does not extend the Tool trait. The TUI subscribes;
non-interactive callers (--print, CI) get a default auto-Decline
handler. Elicitation requests are gated by the existing permission
rule grammar via a new pattern: Elicit(<server>).
Resources are pulled lazily
Resources are not eagerly listed at startup. The first time the user
types @<server>:, caliban calls resources/list for that server and
caches the result; resources/list_changed notifications invalidate
the cache. Resource templates like
github://repos/{owner}/{repo}/issues/{id} are expanded positionally
from arguments typed after the resource name.
Per-server permission scoping lifted into our rule grammar
Claude Code's allowedMcpServers / deniedMcpServers settings become
inline [server.X.permissions] blocks in mcp.toml. They merge with
global permissions in a documented order:
global deny → server deny → server ask → server allow → global ask → global allow → default(Ask). The /mcp overlay shows the effective
rule for a focused tool.
Env-var contract — CALIBAN_* primary, MCP_* fallback
caliban reads CALIBAN_MCP_TIMEOUT, CALIBAN_MCP_TOOL_TIMEOUT,
CALIBAN_MAX_MCP_OUTPUT_TOKENS. If those are unset and the
Claude-Code-style MCP_TIMEOUT / MCP_TOOL_TIMEOUT are set, we honor
them for compat. We do not read MAX_MCP_OUTPUT_TOKENS without
the CALIBAN_ prefix because servers may set it themselves.
Consequences
- Positive: Closes nine 🔴 rows in the parity matrix in one
multi-PR initiative. Transport plurality makes hosted-MCP
ecosystems reachable; OAuth unblocks every commercial server that
uses it. Elicitation is a meaningful UX upgrade (servers can ask
before destructive ops without baking confirmation into every
tool). Resources turn MCP from "tools only" into "tools + data
references" — closes the
@server:resourceparity gap. - Negative: Dependency footprint grows by ~5 crates (
rmcpHTTP/SSE features,oauth2,axum,keyring). Loopback OAuth assumes the user can open a browser; hardened workstations may needoauth = "manual". Token storage adds a per-OS contract surface to test. Elicitation introduces a new modal flow the TUI must handle alongside the Ask modal. - Revisit if: Hosted MCP ecosystem standardizes on a different
auth flow; if
rmcpevolves a higher-level OAuth helper, our bespoke flow can shrink. If resource discovery latency becomes a problem (largeresources/listresponses), promote to eager fetch with a background refresh task.
ADR 0024 · Hook event taxonomy + external handler types
- Status: accepted
- Date: 2026-05-24
- Spec:
docs/superpowers/specs/2026-05-24-hooks-expansion-design.md
Context
caliban's Hooks trait today exposes four events
(before_turn/after_turn/before_tool/after_tool) and is only
addressable from in-process Rust code: there's no way to drop a shell
script into ~/.config/caliban/ and have it run on SessionStart, no
HTTP callback for audit servers, no MCP-tool-as-policy-gate, no
LLM-classifier for UserPromptSubmit. Claude Code's documented hook
surface covers ~25 event names and five handler types; closing that
gap is Tier-1 foundation work because plugins, observability, and
automation all build on it. The full spec is in
docs/superpowers/specs/2026-05-24-hooks-expansion-design.md; this
ADR records the architectural commitments only.
Decision
Event names mirror Claude Code's PascalCase taxonomy
Add 15+ event methods to the Hooks trait, all with default no-op
implementations so existing Hooks impls keep compiling unchanged.
First-class events: SessionStart, SessionEnd, UserPromptSubmit,
PreCompact, PostCompact, ConfigChange, CwdChanged,
FileChanged, SubagentStart, SubagentStop, TaskCreated,
TaskCompleted, PermissionRequest, PermissionDenied,
Notification, Stop, StopFailure, PostToolUseFailure. Reserved
but not-yet-fired in v1: Setup, UserPromptExpansion,
PostToolBatch, InstructionsLoaded, WorktreeCreate,
WorktreeRemove, Elicitation, ElicitationResult, TeammateIdle.
Five external handler types — command/http/mcp/prompt/agent
A new HookRouter consumes hooks.toml (or the hooks table inside
the unified settings.json once ADR 0026 lands) and dispatches events
to externally-configured handlers. The router itself implements
Hooks, so it composes into AgentBuilder like any other in-process
hook stack — behind PermissionsHook in the chain.
- command: spawn a child; stdin is event JSON; stdout JSON (or exit code) determines the decision.
- http:
POSTevent JSON; response JSON is the decision. - mcp: invoke a configured MCP server's tool with the event JSON.
- prompt: call the model router (default
FastClassifierpurpose) with the prompt + event JSON;schemaenables structured-output. - agent: delegate to a subagent (async-only).
Decision protocol — stdout JSON or exit codes
Shell-command handlers signal their decision via stdout JSON
(hookSpecificOutput.permissionDecision ∈ allow|deny|ask,
permissionDecisionReason, optional updatedInput) or via exit
codes (0 = Allow, 2 = Deny with stderr as reason, anything else =
Allow + warning). HTTP and MCP handlers use the same response shape.
We extend HookDecision with UpdatedInput(Value) so hooks can
rewrite a tool's input before dispatch. The rewritten input is
validated against the tool's input_schema(); validation failure is
a hard deny.
Stdin payload uses snake_case + camelCase mix, deliberately
The envelope's hook-protocol fields (hookEventName,
hookSpecificOutput) match Claude Code so existing CC hook scripts
work with a one-line wrapper. Caliban-specific fields
(session_id, tool.useId, turn_index) keep snake_case for
parity with our internal JSON. The diff is documented in the README.
URL allowlist for HTTP hooks; env-var allowlist for ${VAR} expansion
HTTP handlers fail closed: the operator must list each allowed URL
glob in allowed_http_hook_urls (default empty). Headers and URL
${VAR} expansion is gated by http_hook_allowed_env_vars. This
prevents a project-scope hooks.toml from exfiltrating user-scope
secrets via an attacker-controlled callback URL.
Async handlers detach onto a bounded task pool; their decisions are ignored
async = true handlers are fire-and-forget: useful for audit, metrics,
and code-review subagents that observe but don't gate. A
Semaphore-bounded pool (default 16) caps the parallel async-handler
count. Agent-type handlers are async-only by definition (synchronous
subagent calls from a hook would risk turn-budget blowup and
recursion).
Parallel tool dispatch ordering caveat is preserved
Under parallel tool dispatch (ADR 0016), PostToolUse fires in
completion order, not assistant-message order. We document this on
the trait and surface tool_use_id in ToolCtx so hook authors can
correlate. The router serializes hook handlers per-tool-call but lets
distinct tool_use_ids run concurrently.
Kill switch and managed-only mode are first-class
disable_all_hooks = true blocks all external handlers but leaves
in-process Hooks impls running (PermissionsHook, audit, anything
the binary wires up). allow_managed_hooks_only = true further
restricts execution to handlers loaded from the managed settings
scope (ADR 0026). Both flags are visible in the /hooks overlay.
Consequences
- Positive: Closes nine 🔴 rows under "B. Hooks & extensibility"
in
docs/parity-gap-matrix.mdin one PR (only "Plugin packages" and "Hook inheritance for subagents" remain — both gated on other initiatives). Establishes the substrate plugins and observability build on. Shell-command hooks let operators glue caliban into existing audit / CI / policy stacks without touching Rust. - Negative: Hook handlers run with caliban's privileges; shell
hooks are arbitrary code execution by design. Until an OS sandbox
lands, a hostile project-scope
hooks.tomlis a real risk — mitigated by the URL/env allowlists and managed-only mode, but fundamentally a "trust your repos" model. TheHookstrait grows from 4 to ~18 methods; default no-ops keep call-sites compatible but the trait's IDE-completion surface bloats. - Revisit if: Plugin system (ADR 0030) lands and needs richer
package-level hook registration. If hook latency becomes a
bottleneck under heavy parallel dispatch, promote sync-handler
invocation off the dispatcher's hot path. If
UpdatedInputproves too error-prone, narrow it to specific tools or remove it. If Claude Code stabilizes additional event names (Elicitation / Setup / etc.) we promote them from reserved-but-stubbed to actually-fired.
ADR 0025 · Headless -p mode + JSON output protocol
- Status: accepted
- Date: 2026-05-24
- Spec:
docs/superpowers/specs/2026-05-24-headless-mode-design.md
Context
caliban today only runs as an interactive ratatui TUI. Every potential
CI/scripting/devcontainer/GitHub-Actions consumer is blocked on a
non-interactive entry point. Claude Code's -p mode with
--output-format text|json|stream-json is the documented contract
those consumers use; mirroring it engine-to-engine is Tier-1 foundation
work. Full spec at
docs/superpowers/specs/2026-05-24-headless-mode-design.md; this ADR
records the architectural commitments only.
Decision
Headless is a sibling driver, not a fork of the TUI
caliban -p enters a HeadlessDriver that consumes the same
AgentBuilder + Stream<Event> surface from caliban-agent-core.
The TUI driver is unchanged. Both drivers compose the same hook
chain, permission rules, tool registry, and model router — the only
difference is the encoder that turns Events into bytes.
Auto-headless when stdin is non-TTY or stdout is piped, unless
--no-auto-print is explicit. Explicit --print always wins.
Three output formats, with stream-json as the contract surface
- text: the assistant's final message body to stdout. The minimum shape. Default.
- json: a single JSON object identical to the final
type: resultframe of stream-json. Suitable forjq-driven scripts that only care about the answer + cost. - stream-json: NDJSON. First frame is
system/init(model, tools, MCP servers, plugins, settings sources); per-turn frames aretool_use,tool_result,content_block_delta(when--include-partial-messages),system/api_retry,user(when--replay-user-messages),hook_event(when--include-hook-events); last frame istype: result.
Stream-json wraps closely around Claude Code's documented shape so downstream consumers can drop in. Divergences (provider-specific token fields, etc.) are documented in the README; we do not commit to byte-identical compatibility because caliban is provider-agnostic.
Tool calls appear in two frames; the message frame is authoritative
Each successful tool call surfaces in the stream-json output as two frames, by design:
{"type":"tool_use","id":"toolu_01ABC","name":"Glob","input":{"pattern":"**/*.toml"}}
{"type":"tool_result","tool_use_id":"toolu_01ABC","is_error":false,"content":[...]}
{"type":"message","role":"assistant","content":[
{"type":"text","text":"Searching for TOML files…"},
{"type":"tool_use","id":"toolu_01ABC","name":"Glob","input":{"pattern":"**/*.toml"}}
]}
- A top-level short
tool_useframe emitted at the moment the model finishes streaming the tool's input JSON (paired with atool_resultframe once the tool completes). This is a progress indicator — useful for live UIs that want to show "Glob is running" before the assistant's final message is assembled. - The same
tool_useblock embedded inside the subsequentmessageframe (full assistant message, content-block array) emitted atTurnEnd. This is the authoritative record — the serialized assistant turn as the agent would replay it from a session log.
Operators reconstructing the transcript from the stream should read
the message frame and treat the short tool_use/tool_result
frames as progress signal. Tools that count tool_use blocks must
not double-count (one short frame + one inside message = one tool
call, not two).
This mirrors Claude Code, where the assistant message event is the
authoritative full content and per-block progress frames are advisory.
The duplication is intentional; do not dedupe.
Structured input is also NDJSON
--input-format stream-json makes stdin a chat transcript: each line is
either a user message or a control/interrupt frame. The driver
feeds the agent one message per turn. EOF gracefully drains.
This makes caliban scriptable from any language that can emit JSON lines, without juggling pseudo-TTYs.
Input frame schema (canonical)
The simple, caliban-canonical shape:
{"type":"user","content":"hello"}
{"type":"user","content":[{"type":"text","text":"hello"}]}
{"type":"control","subtype":"interrupt"}
user.content accepts either a JSON string or an array of content
blocks (each {"type":"text","text":"…"}). Both flatten to the same
text on the way into the agent.
Unknown type values, malformed JSON, or extra unrecognized fields
on user/control frames are hard parse errors (exit 64,
EX_USAGE). The driver flushes any in-flight assistant frames first,
emits one final result frame with subtype: "error", and only then
returns. This is to avoid the failure mode where an operator sends a
Claude-Code-shaped envelope ({"type":"user","message":{"role":"user", "content":[...]}}) and the driver silently runs the agent with a
blank prompt because serde accepted the unknown message field.
--input-format stream-json requires stdin
When --input-format stream-json is in effect, an explicit prompt is
incompatible with the stream-json input path. The binary rejects
the combination at clap-parse time with EX_USAGE (exit 64) so
operators can't accidentally bypass the frame parser via a positional
prompt or --prompt …. The allowed entry points are:
- No prompt args at all (stdin is read as the NDJSON stream); or
-p -/--print -/--prompt -(the-sentinel explicitly delegates to stdin and is treated as a no-op alongside--input-format stream-json).
--bare is opt-in, not the CI default
--bare disables hooks, skills, plugins, MCP, auto-memory, and
CLAUDE.md auto-discovery. It's the documented "deterministic CI"
mode. Unlike Claude Code's stated direction of making it the default,
caliban's headless default keeps inheriting user/project settings —
operators must opt out explicitly. Rationale: caliban's first
deployments are mostly local-shell automation where inherited settings
are useful; CI runners are well-trained to add flags.
Exit codes follow sysexits.h plus two budget signals
| Code | Meaning |
|---|---|
| 0 | success |
| 1 | generic runtime error |
| 2 | tool/assistant error |
| 64 | EX_USAGE (bad flags) / malformed stream-json input |
| 66 | EX_NOINPUT (--resume <missing>, empty stream-json stdin) |
| 75 | EX_TEMPFAIL — --max-turns exceeded (F12 follow-up: was 130, which collided with 128 + SIGINT) |
| 78 | EX_CONFIGURATION_ERROR (stdin > 10 MB; settings parse failure) |
| 124 | cancelled (SIGTERM / Ctrl-C from the agent loop) |
| 130 | reserved for real SIGINT reaching the harness (128 + 2); the signal handler in caliban/src/main.rs exits with this on a second Ctrl-C |
| 137 | --max-budget-usd exceeded |
CI tooling can distinguish "budget exhausted" from "real failure"
without parsing stdout. Update 2026-05-27 (F12): --max-turns
exhaustion previously exited 130, which is 128 + SIGINT in the
UNIX convention — CI scripts reading $? reasonably concluded the
operator had Ctrl-C'd. It now exits 75 (EX_TEMPFAIL), distinct
from any signal-derived code. Consumers wanting the structured signal
should read the matching result frame's subtype: "max_turns".
Result-frame shape — structured fields for non-success runs
The final result frame's body depends on subtype:
subtype: "success"— the assistant's reply lives in theresultstring field. Token/cost/turn totals are always present. Structured-output payloads are surfaced understructured_outputwhen--json-schemasucceeded. This is the load-bearing contract for downstreamjqscripts and is not changed by the F7 follow-up below.- All non-
successsubtypes (error,max_turns,budget_exceeded,cancelled) — theresultfield is omitted; consumers must read the structured fields instead:last_assistant_text— the most recent non-empty assistant text body the agent produced.null(field absent) when the run terminated before any assistant text landed. Distinct from the prior protocol, which setresultto the concatenation of every streamed assistant fragment across the truncated run — a value that ranged from a stale plan preamble to literally""and couldn't be distinguished from a clean answer.tool_calls_seen— running count ofToolCallEndevents observed across the entire run. Lets consumers tell an empty-but-active run (tool loop) from an empty-and-idle one.error— populated forsubtype: "error"only; carries theStopCondition::ProviderError/HookDenied/CompactionFailed/Refusal/ContentFilter/ schema-validation message verbatim.
Pairs with the exit-code table above: the result frame's subtype
and the process exit code agree on what the terminal condition was,
so consumers can pick either signal.
Cost accumulator lives in caliban-agent-core::headless
A CostAccumulator (per-(provider, model)) wraps each provider call
and accumulates USD against a static pricing table at
caliban-agent-core/src/headless/pricing.json. Pricing misses log a
WARN and treat cost as zero rather than failing — staleness is real,
and we'd rather emit "best-effort, cost may be undercount" than refuse
to run. Pricing table refreshes are by-hand PRs against the provider
websites; the as_of date surfaces in the system/init frame.
Structured output via --json-schema uses provider-native first, falls back to validate-and-retry
For Anthropic / OpenAI native structured-output: the model router
issues the final reply with json_schema semantics, returns the parsed
object as structured_output. For providers without native support
(Ollama, some Google endpoints): prompt + validate + up-to-2 retries
with a "this didn't validate; retry, here's the error" follow-up. After
the retry budget, the result frame's subtype is error.
Hook events are observable in headless mode
--include-hook-events attaches an in-process HookSink at the
outermost position in the hook chain. Each fired event becomes a
hook_event frame, including the router's decision and the
permissions layer's verdict separately. Async handlers emit two frames
(dispatch + completion) so observability isn't lost behind
fire-and-forget. This is the only headless flag that produces zero-cost
visibility into the new hook taxonomy (ADR 0024).
Consequences
- Positive: Closes nearly all rows under "J. Headless / CI" in
docs/parity-gap-matrix.mdin one PR. Unblocks GitHub Actions and devcontainer integrations (each a separate sub-project, but neither is reachable without this). Makes caliban scriptable from any language. Cost accumulator gives operators (and the eventual/usageslash) a single source of truth for $ spent. Stream-json is the contract surface for everything downstream — once it's stable, we can iterate the TUI without breaking automation consumers. - Negative: Pricing table is a maintenance hazard; staleness leads
to silent undercounts. Stream-json diverges from Claude Code in
per-provider token shapes — exact byte-for-byte parity isn't
achievable while remaining provider-agnostic. Bare mode adds another
axis of "what was actually configured during this run" that
operators must reason about (mitigated by
system/initsurfacing the source chain). Structured-output fallback retry loop is bounded but adds two extra provider calls in the worst case. - Revisit if: Downstream consumers demand byte-for-byte
Claude-Code stream-json parity — we'd add a compat translator
rather than rework the encoder. If pricing maintenance becomes
untenable, host the table behind a hosted JSON file refreshed on a
schedule. If
--baresemantics need to expand (skipping--system-prompt-file, etc.), promote it to a typedBareModeFlagsstruct rather than a single bool.
ADR 0026 · Layered settings.json + /config editor
- Status: accepted
- Date: 2026-05-24
- Spec:
docs/superpowers/specs/2026-05-24-settings-hierarchy-design.md
Context
caliban today has three ad-hoc TOML files (permissions.toml,
mcp.toml, the upcoming hooks.toml per ADR 0024) each loaded by its
own crate with no shared scope hierarchy, no schema, no merge rules,
no live reload, no interactive editor, and no dynamic auth surface.
Claude Code consolidates all of this into one layered settings.json
with documented managed > user > project > local merge semantics, a
JSON Schema at https://json.schemastore.org/claude-code-settings.json,
a tabbed /config editor, and apiKeyHelper for dynamic API-key
refresh. Closing that gap is Tier-1 foundation work because plugins
(eventual ADR 0030), observability, headless mode, and downstream
tooling all want a single configuration story. Full spec at
docs/superpowers/specs/2026-05-24-settings-hierarchy-design.md; this
ADR records the architectural commitments only.
Decision
JSON is the primary format; TOML is honored at the same path
settings.json is the canonical filename at each scope. The same path
with a .toml extension is parsed identically (settings.toml,
settings.local.toml). Rationale for JSON-primary: parity with Claude
Code's documented schema URL, JSON-Schema editor support out-of-the-box,
and serde supports both with no extra work. If both exist in the same
scope, JSON wins with a WARN logged.
Four scopes with a documented merge order
In priority order, CLI > Local > Project > User >
Managed (default). Managed sits at the bottom by default so
operators can augment org defaults, but moves to the top when the
managed setting sets parentSettingsBehavior: "block" — mirrors
Claude Code's escape hatch. --settings <FILE|JSON> injects a virtual
scope above local; --setting-sources <CSV> restricts which scopes
are read (e.g. user,project for known-good CI base).
Merge rules: scalars highest-wins, arrays mostly concatenate
Per-key rules are documented in the spec. The headline:
- Permission arrays (
allow/ask/deny), hook arrays (hooks.<Event>), MCP allow/deny lists,available_models,additional_directories,claude_md_excludesall concatenate in priority order with dedup where meaningful. mcp.servers.<name>andenvdeep-merge.- Every other scalar is highest-wins.
The /config Effective tab annotates each value with the scope it
came from.
Strongly-typed Settings struct with deny_unknown_fields
Settings is a serde-derived struct in caliban-core::settings. Top-
level keys are typed; unknown top-level keys fail loudly. A
#[serde(flatten)] extra: BTreeMap<String, Value> escape hatch
captures forward-compat keys without forcing a release for every new
Claude Code field. JSON Schema is generated from schemars derives at
build time and published at https://caliban.dev/schemas/settings.json.
Per-feature TOML files remain a compat fallback for one deprecation window
permissions.toml / mcp.toml / hooks.toml continue to load only
when the unified settings.json does not define the matching top-level
key. caliban config migrate round-trips them into a single
settings.json. After one minor release the compat path logs
DEPRECATED; after two it errors.
Live reload via notify + arc-swap + ConfigChange hook
A SettingsWatcher watches each scope's path, debounces 250 ms,
re-loads + re-merges, and atomically swaps the Arc<Settings> via
arc-swap. A ConfigChange hook event (ADR 0024) fires with the diff
so external observers and in-process subscribers can react. Live-
reloadable keys are documented (permissions.*, hooks.*,
api_key_helper.*, UI keys, env, etc.). Restart-required keys
(model, mcp.servers.*, auto_memory_*) log WARN on change and
take effect on next launch; /config shows a "restart required" badge.
apiKeyHelper is shell-out with caching + per-provider routing
A configurable script that emits the provider API key on stdout. Two shapes:
- Single helper with
provider: "*"as fallback for all providers. - Array of helpers keyed by provider.
Cached refreshIntervalMs (default 5 min) or until a provider returns
401, whichever comes first. Refresh is inline against a
slowHelperWarningMs (default 10 s); env var
CALIBAN_API_KEY_HELPER_TTL_MS mirrors Claude Code's contract. The
helper is execv'd without a shell to avoid argv injection.
Auth precedence chain (per provider): per-provider helper → wildcard helper → env var → keyring → anonymous (local providers).
/config is a tabbed TUI overlay; edits write to project scope by default
Tabs per top-level key group (Model, Permissions, Hooks, MCP, Memory,
UI, Auth, Effective). Each row carries a [scope] chip showing which
scope contributed the effective value. s cycles the write-scope;
w flushes pending edits via atomic temp-file + rename. The Effective
tab is read-only and mirrors caliban config print.
The file-watcher picks up /config's own writes automatically, so the
running process refreshes via the same code path external edits hit —
no extra plumbing.
Consequences
- Positive: Closes all five rows under "D. Configuration /
settings" in
docs/parity-gap-matrix.mdplus the/configrow in section M. Establishes the single configuration story plugins, hooks, MCP, model router, and headless mode all consume.apiKeyHelperunlocks short-lived-credential workflows (AWS STS, GCP IAM, internal vault systems) caliban can't currently participate in. Live reload makes hook/permission iteration cycle- fast. - Negative:
Settingsstruct gains ~30 top-level keys — a real surface area to keep typed and tested. Merge rules are intricate (8-row table); operator confusion is real, mitigated by the Effective tab. Live reload introduces "settings changed mid-turn" semantics that subtle bugs can hide in (e.g. a permission allowed at turn start gets revoked mid-turn — we honor the rule at dispatch time, but documenting and testing that boundary takes care). One-release compat window for legacy TOMLs adds short-term parser surface.apiKeyHelperis shell-out; a managed-scope malicious script would be a privesc vector (mitigated by managed paths being root-owned by convention). - Revisit if: Settings struct grows beyond ~50 top-level keys
(refactor into named sub-modules per group). If live-reload
semantics prove too surprising for operators, move to a
reload-on-
/config-wmodel. If managed delivery channels (Windows registry, macOS plist) become a real ask, add aScopeLoaderbackend per channel. IfapiKeyHelper's 5-minute cache proves wrong for short-TTL credentials, exposerefreshIntervalMsper provider.
ADR 0027 · TUI ergonomics pack
- Status: accepted
- Date: 2026-05-24
- Spec:
docs/superpowers/specs/2026-05-24-tui-ergonomics-design.md
Context
caliban's TUI ships the basics (slash menu, @-attach, mouse-wheel
scroll, plan-mode chip, spinner) but six 🔴/🟡 rows under E. TUI
ergonomics in docs/parity-gap-matrix.md block day-to-day parity
with Claude Code: no shell escape, no external editor handoff, no
permission Ask modal (deferred from PR #8), no transcript viewer, no
reverse history search, and the @file suggestion path is hard-coded
with no operator override.
Each is small in isolation; together they push on the same input-bar
state machine and overlay rendering infrastructure. Shipping in one
batch lets us refactor InputMode once instead of three times.
The Ask modal in particular has knock-on effects: it's the only piece of UI that blocks the agent loop on user input, so it sets the contract for both auto-mode (ADR 0029) and MCP elicitation (ADR 0023). Landing it here gives both specs a stable target.
Decision
One input bar, many modes
Promote InputMode from {Idle, SlashMenu, AtMenu} to a richer enum
that adds ShellEscape, ReverseHistory, ExternalEditor,
AskModal, and TranscriptViewer. The first two keep the prompt
visible; the last three are modal and short-circuit the main key
dispatch. All input-area key handling moves under a single
handle_input_key function.
!cmd is a synthesized Bash invocation
A leading ! at column 0 routes the rest of the line into the
existing Bash tool via the existing permission hook. That gives us
the rule grammar (Bash:git *, Bash:rm *, …) for free and keeps the
audit trail consistent. The synthesized call is not added to the
conversation history — it's a user action. Plan mode still gates.
External editor is a tempfile roundtrip
Ctrl+G writes the input buffer to a tempfile, leaves the alternate
screen, execs $VISUAL/$EDITOR/vi with the path as argv, reads
the result back on exit, re-enters the alt-screen. The editor value
is whitespace-split verbatim (no shell parsing); EDITOR='code --wait' works.
The Ask modal lives in a new caliban-tui-ask crate
Adding a thin caliban-tui-ask crate keeps caliban-agent-core
UI-agnostic. It implements the existing AskHandler trait with an
mpsc/oneshot bridge to a ratatui modal supporting four actions —
Allow once, Allow + persist project, Allow + persist user, Deny —
with in-process re-load of the appended rule.
Transcript viewer renders Message directly
Ctrl+O walks App.messages and renders every ContentBlock
variant (text, thinking, tool_use, tool_result, image, redacted) — the
model-eye view, distinct from the streaming-friendly TranscriptLine
view. [ dumps viewport to scrollback via leave/re-enter alt-screen;
v opens the full transcript in $VISUAL.
Reverse history search is scope-cycled
Ctrl+R opens at session scope; Ctrl+S cycles through project and
all-projects scopes. Wider scopes lazily memoize from SessionStore
in spawn_blocking with a 2s budget.
File suggestion source becomes a trait
FileSuggestionSource with two impls: IgnoreWalkerSource (default,
gitignore-aware) and CommandSource (spawns an operator-configured
program). Walker stays on the existing ignore crate — no new deps.
Consequences
- Positive. Six 🔴/🟡 rows move to ✅ in one initiative. The Ask modal unblocks ADR 0029 (auto-mode) and reuses the same overlay primitives that ADR 0023 needs for MCP elicitation. Operators get the keyboard surface expected of any modern agent CLI.
- Negative.
InputModebecomes a fatter enum;handle_eventneeds careful refactoring to keep existing tests green. One new crate (caliban-tui-ask). Persisting Ask-modal decisions adds a write path intopermissions.tomlwe previously only read from — parse-error and race-with-manual-edit cases need defensive handling. - Revisit if: vim mode lands and the
InputModeenum needs reshape into(BarMode, EditorMode). The transcript viewer is a natural anchor for/recapand/btwlater. - Out of scope, enabled by this work: background bash (Ctrl+B), vim mode, image input, voice dictation.
References
- Spec:
docs/superpowers/specs/2026-05-24-tui-ergonomics-design.md - Permissions trait:
crates/caliban-agent-core/src/permissions.rs - Overlay primitives:
caliban/src/tui.rs::centered_rect - Attach scaffold:
caliban/src/tui/attach.rs - Companion ADRs: 0028 (Checkpointing — consumes Esc-Esc), 0029 (Auto-mode — consumes the Ask modal), 0023 (MCP v2 — reuses overlay primitives).
Revised 2026-05-26
The original Decision committed the Ask modal to a new caliban-tui-ask
crate. In practice the modal shipped at caliban/src/tui/ask.rs (~202
LOC) inside the binary.
Why this is the correct outcome. The modal is binary-coupled (it
consumes the binary's App state, dispatches via the binary's Action
enum, and renders into the binary's overlay system). Extracting it would
require either threading App/Action/overlay traits through a public
surface or duplicating them — both costs without payoff. The "extract
when sharable" trigger from the original Decision never fired.
Revisit if another consumer needs the modal (e.g., a hypothetical
standalone caliban-tui library separated from the binary), or LOC
grows past ~500.
ADR 0028 · Checkpointing + /rewind
- Status: accepted
- Date: 2026-05-24
- Spec:
docs/superpowers/specs/2026-05-24-checkpointing-design.md
Context
Claude Code's checkpoint + /rewind feature lets operators try an
aggressive multi-tool prompt knowing they can undo the file changes
without losing the conversation that followed. caliban has neither
piece: no per-prompt snapshot, no rewind menu, no Esc-Esc shortcut.
The C. Memory & checkpointing section of
docs/parity-gap-matrix.md flags this as 🔴 in three rows; M.
Slash command coverage flags /rewind as 🔴.
The natural place to wire snapshots is the Hooks trait — it already
fires before_tool/after_tool where we need to read pre-images.
The natural place to wire restore is the session store — truncating
message history is the same shape as a session edit.
Key trade-offs: scope (file-tool edits only, mirroring Claude Code's
Bash exclusion — capturing arbitrary subprocess side-effects is
intractable); storage layout (mirror Claude Code's
~/.claude/projects/<project_dir_hash>/checkpoints/<session>/ so
operators with both tools recognize the shape; override via
CALIBAN_CHECKPOINT_ROOT); manifest + content-addressed pre-images
over whole-tree cp -a (cheaper, inspectable with ls+cat,
cross-prompt dedup is a future hard-link sweep, not v1).
Decision
Two new lifecycle hook events
Hooks gains before_run and after_run with default no-op impls
— existing consumers compile unchanged. These are the minimum events
checkpointing needs; broader hook-surface parity (Tier 1) will expand
the trait further but stays compatible with this addition.
A new Layer-2 crate caliban-checkpoints
Recorder, store, hook impl, and restore logic live in a new crate
depending on caliban-agent-core, caliban-provider,
caliban-sessions. Keeps agent-core's compile time and dep surface
unchanged.
Manifest-based, content-addressed pre-image store
For each touched file, record the pre-image once (keyed by sha256)
with metadata in prompt-N/manifest.json and blobs under
prompt-N/objects/<sha256>. Newly-created files record with
exists_pre: false — restore deletes them. Blob storage (not git,
not a database) because operators already trust the filesystem,
it's trivially inspectable, and cross-prompt dedup can be added
later as a background hard-link sweep without a schema change.
Only Write/Edit/NotebookEdit/(future)MultiEdit trigger
recording. Bash, WebFetch, MCP, and external writes are documented
out of scope; the rewind menu surfaces this in its footer.
Plan-mode prompts emit empty manifests
Plan mode rejects mutating tools, so manifests come out empty. We
still emit prompt-N/manifest.json with kind: "plan" and entries: [] so the prompt is selectable for conversation rewind, keeping
cursor positioning sensible across plan/non-plan prompts.
Five restore variants
/rewind menu offers: restore code, restore conversation, restore
both (Enter default), summarize from here, summarize up to here. The
summarize variants drive the existing SummarizingCompactor on a
slice of session.messages — no new summarizer.
Esc-Esc trigger, precedence owned by ADR 0027
When InputMode::Idle and buffer.is_empty(), two Esc presses
within 400ms open the rewind menu. Single Esc continues to close
modes / cancel turns. The interaction precedence is owned by ADR 0027.
Pruning is tied to session pruning
A checkpoint directory is removed only when cleanupPeriodDays
(default 30) has elapsed since its last update and the
corresponding session is being pruned by SessionStore::prune. The
two operations are coupled so we never orphan checkpoints while the
session is still resumable.
CALIBAN_CHECKPOINT_MAX_BYTES (default 5 GiB per project) caps total
blob size; on overflow, oldest prompt blobs drop first.
Consequences
- Positive. Three 🔴 rows move to ✅ in one initiative. The two
new hook events are reusable — any future hook-surface work (Tier
- inherits the contract. The content-addressed blob layout is small enough to ship in one PR and expressive enough to grow into cross-prompt dedup later. Claude Code parity on the storage path makes a future "migrate Claude Code checkpoints into caliban" tool a one-evening project.
- Negative. Per-tool disk I/O on the hot path (pre-image read for every Write/Edit). The 16 MiB cap keeps it bounded but at the cost of unrestorable large files. One more workspace crate. Bash mutations remain unobservable — documented but still a footgun.
- Revisit if: operators demand Bash tracking (could overlay a
filesystem-watcher-based recorder, significant complexity); or if
storage I/O becomes a bottleneck (could move pre-image reads into
a
tokio::spawnshadowing the agent loop). - Out of scope, enabled here:
/fork(branch from checkpoint), cross-machine sync, per-tool-call (not per-prompt) granularity.
References
- Spec:
docs/superpowers/specs/2026-05-24-checkpointing-design.md - Hook trait:
crates/caliban-agent-core/src/hooks.rs - Summarizer:
crates/caliban-agent-core/src/compact.rs::SummarizingCompactor - Session store:
crates/caliban-sessions/src/store.rs - Companion ADRs: 0027 (TUI ergonomics — owns Esc-Esc precedence and
overlay primitives), 0021 (Sub-agents — will carry
/forklater). - Parity reference:
docs/claude-code-capability-inventory.md§11.
ADR 0029 · Permission modes + auto-mode classifier
- Status: accepted
- Date: 2026-05-24
- Spec:
docs/superpowers/specs/2026-05-24-permission-modes-design.md
Context
caliban's permission model is a static rule grammar
(permissions.toml) layered on a single plan flag. Claude Code
ships six permission modes cycled with Shift+Tab
(default/acceptEdits/plan/auto/dontAsk/bypassPermissions),
each composing differently with the rule grammar. The marquee piece
is auto mode, where a fast classifier model labels each tool call
as allow/soft_deny/hard_deny based on workspace/file/network
sensitivity rules.
A. Permissions & safety in docs/parity-gap-matrix.md flags this
as the headline 🟡/🔴 gap once the OS sandbox is set aside as a
separate Tier-4 investment. ADR 0020 (static rule grammar) and ADR
0022 (model router with RequestPurpose::FastClassifier) already
shipped. The infrastructure is in place; this ADR connects the pieces.
The classifier model lives in the router, not the permission system — the classifier is just another routed call by purpose. The permission system holds only the orchestration (when to call it, how to cache, how to compose with static rules).
Decision
Permission modes layer over the rule grammar, not under it
The existing PermissionsHook continues to produce Allow/Deny/Ask
from static rules. A new ModeFilter wraps that hook and overrides
the verdict according to the active mode. Composition order:
ModeFilter(BypassPermissions latched) ─ short-circuit Allow
│ otherwise
▼
PermissionsHook → Allow / Deny / Ask
│
▼
ModeFilter post-pass may override Ask only
Static Allow/Deny always win — operators trust their TOML. Only
Ask is mode-overridable, except bypassPermissions which
short-circuits everything (including static Deny) and requires an
explicit confirmation flag.
bypassPermissions requires --allow-dangerously-skip-permissions
The only mode that can override static Deny. To enter it, the
operator must pass the flag at startup (sets a session-wide latch).
Cycling via Shift+Tab into bypass without the latch fires a warning
toast and reverts to default. Starting with defaultMode = "bypassPermissions" without the flag aborts startup.
Auto-mode is a classifier consult, cached by input shape
auto only runs the classifier when the rule verdict is Ask.
Allow/Deny pass through. A 256-entry LRU keyed on (tool_name, sha256(canonicalized_input)) caches verdicts for the session. The
classifier dispatches via RequestPurpose::FastClassifier on the
existing router — operators wire Haiku, GPT-4o-mini, a local Ollama
model, whatever.
Static rule pre-pass in auto-mode.toml
Before the model call, auto-mode.toml's
hard_deny/soft_deny/allow arrays are walked in that order;
first match short-circuits with source: StaticRule. The model is
the expensive fallback, not the first stop. $defaults.<list>
expands to a curated, version-pinned default (sudo, recursive
deletion, piped curl, secret-bearing paths, plain-http).
soft_deny falls through to the Ask modal
When the classifier returns soft_deny, the verdict becomes a
synthesized Ask request flowing into the same TuiAskHandler (ADR
0027) the static Ask rules use. The classifier's reason string is
rendered in the modal. This relies on ADR 0027 being merged first.
A new Layer-3 crate caliban-auto-mode
Classifier, config loader, and curated defaults live in a new crate
between caliban-agent-core and the router. The core's permissions
module gains only PermissionMode, SharedPermissionMode, and
ModeFilter — provider-call-free types.
Sub-agents inherit parent mode by SharedPermissionMode clone (ADR
0021); per-subagent override is v2 follow-up.
disableAutoMode = true (or CALIBAN_DISABLE_AUTO_MODE=1) is a hard
kill switch — classify always returns SoftDeny { source: DisabledFallback }.
Consequences
- Positive. Closes two of three remaining 🔴/🟡 rows under Permissions & safety (OS sandbox is deliberately separate). Auto-mode is signature differentiation — caliban's operator-defined classifier model (any provider) is meaningfully more flexible than Claude Code's bundled Haiku. Composition with static rules is auditable and testable in isolation.
- Negative. One more crate. Hot path gets a network call per
Ask(mitigated by cache + static pre-pass).bypassPermissionsadds a footgun surface needing UX work (red chip, confirmation toast). The mode enum overlaps with the existingSharedPlanModeflag — we keep both for back-compat at the cost of a small synchronization burden. - Revisit if: classifier p95 latency becomes a UX problem (could pre-compute verdicts for likely next-tool shapes); or if curated default lists need more maintenance than the Rust release cadence supports (could pull from a versioned upstream JSON).
- Out of scope, enabled here: per-subagent permission modes (ADR
0021 v2),
/permissionsinteractive editor, classifier audit log, mode-aware hook events (PermissionRequest/PermissionDenied) once the broader hook surface lands.
References
- Spec:
docs/superpowers/specs/2026-05-24-permission-modes-design.md - Static rule layer:
crates/caliban-agent-core/src/permissions.rs - AskHandler trait: same file (
AskHandler,NonInteractiveAskHandler) - FastClassifier purpose: ADR 0022
- Companion ADRs: 0027 (TUI ergonomics — ships Ask modal, must merge first), 0028 (Checkpointing — parallel hook-surface work), 0021 (Sub-agents — v2 refines per-subagent override).
- Parity reference:
docs/claude-code-capability-inventory.md§6, §3.
Revised 2026-05-26
The original Decision committed caliban-auto-mode to be a new Layer-3
crate. In practice the implementation lives inside caliban-agent-core
across auto_mode.rs, mode_filter.rs, and permission_mode.rs
(~1,750 LOC combined).
Why this is the correct outcome. Auto-mode dispatch is tightly
coupled to the permission pipeline (PermissionsHook,
SharedPermissionMode, the soft-deny → Ask handshake) which already
lives in agent-core. Extracting auto-mode would either pull most of the
permission pipeline out with it or introduce a circular dep. The static
rule pre-pass, the classifier dispatch, and the LRU cache all live next
to the data they need.
Revisit if auto-mode grows a second consumer (e.g., a non-agent
classifier client), or if the dispatch path becomes a measurable
compile-time burden on caliban-agent-core.
Headless -p defaults — what actually runs
When caliban -p is invoked without --permission-mode,
--no-permissions, or any explicit allow/deny/ask flag, the resolved
mode is PermissionMode::Default (per resolve_startup_mode in
permission_mode.rs). Static rule evaluation still runs: the built-in
default-rules tail (default_rules() in permissions.rs) Allows
read-only tools (Read, Grep, Glob, TodoWrite,
EnterPlanMode/ExitPlanMode), Asks for mutating ones (Write,
Edit, Bash, WebFetch), and catch-alls to Ask.
In headless mode, there is no TTY to prompt, so Ask verdicts are
routed to NonInteractiveAskHandler (in agent-core's
permissions.rs). Its behavior:
auto_allow: false(the default) —Askbecomes a hard deny. The tool call fails with a permission error.auto_allow: true(set via--auto-allow/CALIBAN_AUTO_ALLOW) —Askbecomes Allow. Equivalent to running indontAskmode for the duration of the run.
The net effect: a tool-using prompt that touches only read-only tools
(Read, Glob, Grep) runs to completion silently because each tool
hits an explicit Allow. A prompt that needs Write/Edit/Bash
without --auto-allow or an explicit --allow/--permission-mode
flag will fail on the first such tool call. The lmstudio 2026-05-27
probe (Finding 15) observed the read-only case and reported it as
"auto-dispatch without prompting" — that's accurate, but only because
Read is on the default Allow list.
--no-permissions is the only way to skip the static rule layer
entirely; the resolved mode surfaces in the system/init frame's
permission_mode field as the literal string "disabled" to make
this state observable (lmstudio Finding 15). All other modes surface
under their camelCase name (default, acceptEdits, plan, auto,
dontAsk, bypassPermissions).
ADR 0030 · Plugin packaging
- Status: accepted
- Date: 2026-05-24
- Author: john.ford2002@gmail.com
- Spec:
docs/superpowers/specs/2026-05-24-plugin-system-design.md - Depends on: ADR for hooks-expansion (forthcoming alongside
specs/2026-05-24-hooks-expansion-design.md)
Context
Skills (ADR 0019), MCP servers (ADRs 0017 / 0023), sub-agents (ADR 0021), and the forthcoming hooks-expansion + output-styles work each ship as their own discovery surface. Operators who want to share a package of related customizations — Claude Code's "plugin" model — currently have to drop files into half a dozen directories by hand.
Claude Code unifies all five surfaces under a single plugin directory
with one plugin.json manifest; settings expose enabledPlugins,
marketplace allowlists, and strictPluginOnlyCustomization; the
/plugins slash command and claude plugin CLI manage install /
enable / disable / remove. This ADR records caliban's commitment to the
same shape.
Decision
A plugin is a directory with a plugin.json manifest
A plugin is <plugin-name>/plugin.json plus optional subdirectories
skills/, hooks/, agents/, output-styles/, mcp/, commands/.
The manifest declares name (matches directory), version,
description, author, license, optional caliban.min_version and
caliban.platforms, and a components object pointing at the bundled
files. JSON (not TOML) so the surface stays uniform with hooks and MCP
configs (also JSON in their canonical forms). Unknown manifest keys are
preserved through serde to leave room for forward-compat fields.
Three discovery roots, project > user > managed
- Project:
<workspace>/.caliban/plugins/<name>/ - User:
$XDG_DATA_HOME/caliban/plugins/<name>/ - Managed:
/etc/caliban/plugins/<name>/(Linux), platform analogues elsewhere. Managed plugins ignoreplugins.enabled(policy-enforced).
A plugin with the same name in an earlier root replaces the later one
— no manifest merging.
Items are namespaced: <plugin>:<item>
Skills, agents, and output styles loaded from a plugin carry the
<plugin>:<item> prefix. They cannot collide with bare-named items at
the user level (project-level bare items still shadow them). Hooks
merge additively across plugins. MCP servers are exposed under
<plugin>:<server> to avoid colliding with user-configured servers.
Collision priority is project > plugin > user. Strict project-only
operators get strict_plugin_only_customization = true, which ignores
bare-file customizations under ~/.caliban/skills/* etc. entirely.
${CALIBAN_PLUGIN_ROOT} expansion at the plugin boundary
Plugin-bundled MCP configs and hook commands need to reach binaries
inside the plugin without hardcoding install paths.
caliban-plugins expands ${CALIBAN_PLUGIN_ROOT} to the plugin's
absolute root directory before passing config downstream.
${CLAUDE_PLUGIN_ROOT} is an honored alias so existing Claude Code
plugins port verbatim. Any other ${VAR} is passed through to the
downstream consumer's own expansion (MCP client, hooks loader).
Marketplaces are public JSON indices fetched on demand
A marketplace is one HTTP(S) URL serving a JSON index of plugins +
versions + tarball URLs + sha256 hashes. caliban plugin install <name>@<marketplace> fetches the index, verifies the marketplace is
in plugins.marketplaces.strict_known and not in blocked, downloads
and extracts the tarball, and writes a trust record.
Signature verification is out of scope for v1. Trust is by source URL
- manifest hash, surfaced in the install prompt. v2 may add cosign / minisign.
Trust gating on first install
Sideloads aren't gated (the operator already had filesystem access).
Marketplace installs prompt with plugins.trust_message, the manifest
contents, the manifest sha256, and the install URL. Acknowledged
installs are recorded in $XDG_DATA_HOME/caliban/trust/plugins.json;
re-installs of identical manifest hashes skip the prompt; version bumps
re-prompt.
New crate: caliban-plugins
A thin orchestrator: it parses manifests, resolves namespaces, expands
${CALIBAN_PLUGIN_ROOT}, and hands paths + configs to the existing
loaders (skills, hooks, MCP, agents, output-styles). It does not
duplicate any per-surface logic. The caliban binary constructs one
PluginManager at startup and wires its outputs into the existing
loaders.
Consequences
- Positive: Closes Matrix row B "Plugin packages" and the
/pluginsslash row in one initiative. Existing Claude Code plugins port with at most a directory rename (${CLAUDE_PLUGIN_ROOT}alias). Each downstream loader stays single-purpose — plugins are a composition concern, not a per-loader concern. Trust gating gives operators a real "I have read this" moment without locking sideloads behind ceremony. - Negative: Adds a new crate and a new settings surface (
plugins.*,plugins.marketplaces.*). Marketplace install adds three new dependencies (tar,flate2,sha2) and an HTTP fetch path separate from MCP's. Trust records create a small migration burden if we ever move the on-disk format (mitigated by versioning the file). The unified hooks taxonomy must land first; this ADR's hooks-merging behavior is a no-op until then. - Revisit if: Operators demand signed plugins (move to v2 cosign / minisign verification). The bare-vs-namespaced collision rules surprise users in practice (consider an explicit per-plugin "alias to bare name" affordance). Hot-reload of plugin contents becomes a real need (today it requires restart).
ADR 0031 · Output styles
- Status: accepted
- Date: 2026-05-24
- Author: john.ford2002@gmail.com
- Spec:
docs/superpowers/specs/2026-05-24-output-styles-design.md - Depends on: ADR 0018 (memory tier model — splice pattern reused), ADR 0019 (skills — frontmatter parser pattern reused), ADR 0030 (plugin packaging — plugin-supplied styles).
Context
Claude Code exposes four built-in output styles — Default,
Proactive, Explanatory, Learning — plus a custom-style file
format with frontmatter (name, description, keep-coding-instructions,
force-for-plugin). Styles modify the system prompt only; they're
orthogonal to permission mode, tools, and hooks. Operators activate
via /config → Output style or the outputStyle setting. Caliban
currently has none of this surface (matrix row L is 🔴 across the
board).
Decision
Output styles are markdown files with frontmatter, like skills
A custom style is a single .md file with a YAML frontmatter block
declaring name, description, keep_coding_instructions (bool,
default true), and force_for_plugin (bool, default false). The
body is the prompt block. The parser reuses serde_yaml (already in
the workspace for skills) and mirrors caliban-skills's frontmatter
shape.
We use snake_case (keep_coding_instructions, force_for_plugin)
internally; the loader accepts kebab-case (keep-coding-instructions,
force-for-plugin) as aliases for Claude-Code-format compatibility.
A new crate caliban-output-styles holds it
Modeled on caliban-skills: loader + struct + tool-adjacent pieces.
It owns OutputStyle, OutputStylePrefix, default_roots,
load_styles, select_active, and the Learning post-processor.
Built-in style bodies live as include_str!'d markdown files under
crates/caliban-output-styles/src/builtins/.
Discovery roots and shadowing
Same shape as skills: project > user > plugin > built-in. Project
styles at <workspace>/.caliban/output-styles/<name>.md shadow user
styles at $XDG_CONFIG_HOME/caliban/output-styles/<name>.md, which
shadow plugin-supplied styles (which are namespaced
<plugin>:<name>), which shadow the four built-ins.
The splice pattern is reused from MemoryPrefix
OutputStylePrefix::splice_into(base) wraps the active style's body in
<output-style name="...">…</output-style> and prepends to base. It
composes with MemoryPrefix::splice_into: memory tiers go first, then
the output-style block, then the base body. The Default style is the
no-op — it emits no block at all, so switching to Default produces
the exact same prompt as having no style configured. This minimizes
prompt-cache invalidation for operators who never customize.
Style activation requires /clear or restart
System prompts are cached by every major provider. Live-swapping the
style mid-session would invalidate caches without warning and produce
inconsistent assistant behavior. The /config → Output style overlay
surfaces a "applies after /clear or restart" hint; the in-memory
selection updates, but the system prompt that the provider sees does
not change until the next session.
The Learning style is the only style that touches assistant text
Learning instructs the model to emit TODO(human): <prompt> markers
on non-trivial decisions; a post-processor (the new
AssistantPostProcessor trait in caliban-agent-core) tags those
markers in the assistant's output so the TUI can highlight them.
Default, Proactive, and Explanatory install an identity
post-processor. Tools, hooks, and message contents are unaffected.
force_for_plugin: true lets a plugin pin its style
A plugin-supplied style with force_for_plugin: true overrides the
operator's output_style setting while the plugin is enabled. The
/config picker shows a "locked by plugin: X" badge. Disabling the
plugin releases the lock and the operator's selection returns. Bare
(non-plugin) styles with force_for_plugin: true are ignored — only
plugin-sourced styles honor the flag.
Consequences
- Positive: Closes matrix row L (both rows) with a single
small-footprint crate that reuses two existing patterns (memory
splice + skills frontmatter parse). Plugin-supplied styles fit
naturally into the namespacing already proposed in ADR 0030. The
keep_coding_instructions: falseknob unlocks documentation-/writing-only modes without a separate "agent mode" feature. - Negative: Adds a new crate. Frontmatter parsing duplicated
between
caliban-skillsandcaliban-output-styles(deferred: factor out afrontmatterhelper incaliban-coreonce a third consumer appears). Prompt-cache invalidation is the operator's responsibility on style switch — surfaced via a hint, but still a papercut. TheLearningpost-processor adds a small per-turn cost even when the marker scan finds nothing. - Revisit if: Operators want streaming-time style mutation (today the post-processor runs after streaming completes). Style composition becomes a real ask (today only one style is active). A community style library justifies bundling a marketplace pointer in defaults.
ADR 0032 · OS-level sandbox
- Status: accepted
- Date: 2026-05-24
- Author: john.ford2002@gmail.com
- Spec:
docs/superpowers/specs/2026-05-24-os-sandbox-design.md - Depends on: existing
caliban-tools-builtin::BashTool(crates/caliban-tools-builtin/src/shell/bash.rs).
Context
Permission rules (ADR 0020) gate which commands the agent asks about
before running. They do nothing once a command is approved. An agent
that's been told Bash(*) is allow can rewrite the home directory
or exfiltrate via curl with no friction. Claude Code mitigates this
with an OS-level sandbox — Seatbelt on macOS, bubblewrap on Linux —
that restricts the child process itself. With the sandbox enabled,
operators can drop the per-command Ask entirely (autoAllowBashIfSandboxed)
because the sandbox is the protection.
Matrix row A "OS-level sandbox" is 🔴 and flagged as a big lift / security-critical. This ADR records the decision to ship it as a shim layer over the existing Bash plumbing.
Decision
Two backends, one config surface
- macOS:
sandbox-execwith a generated.sb(TinyScheme dialect) profile written to$XDG_RUNTIME_DIR/caliban/sandbox/<sessid>.sb. Profile is computed once per session from settings. - Linux (and WSL):
bwrapwith--bind/--ro-bind/--tmpfsflags plus optional--unshare-netand--unshare-user. Argv is computed once per session. - Windows native: not supported in v1. Refuses to enable; documents Job Objects + AppContainer as the v2 path.
A single [sandbox] settings block drives both backends. Operators
configure intent (allow-write paths, allowed domains, etc.); the
backend translates intent into its native policy language.
A new crate caliban-sandbox provides a shim, not a rewrite
caliban-sandbox exposes SandboxedShim::wrap_command(cmd, command_str) which either returns cmd unchanged (sandbox disabled,
or command on the unsandboxed allow-list) or wraps it in a new
tokio::process::Command whose program is sandbox-exec / bwrap
and whose tail is the original command. BashTool::invoke calls
wrap_command after building its base Command; everything else —
stdout/stderr capture, PID-group cleanup, cancellation, timeouts —
stays identical.
This keeps the change tightly scoped: the sandbox is a layer, not a fork of Bash.
auto_allow_bash_if_sandboxed short-circuits the Ask modal
Setting sandbox.enabled: true and
sandbox.auto_allow_bash_if_sandboxed: true makes the permission
classifier short-circuit Bash(*) to allow before the Ask modal
would fire. Rule grammar isn't modified — the short-circuit sits
alongside plan-mode-bypass in the permission pipeline.
allow_unsandboxed_commands entries (commands that genuinely need
unrestricted access) are not auto-allowed; they keep going through
the normal rules because they're running unsandboxed.
The auto-allow knob defaults to false; both settings must be set
deliberately.
Network egress is sandbox + proxy, not sandbox alone
Neither Seatbelt nor bwrap enforces per-hostname egress reliably on
its own. The supported patterns are:
allowed_domains = []: deny all egress (--unshare-net/ Seatbelt nonetwork-outbound).http_proxy_port = N: deny all egress except127.0.0.1:N; the operator runs a domain-aware HTTP proxy at that port.- Both unset,
allowed_domainsnon-empty on Linux: a warning is logged; the sandbox is less restrictive than the operator probably intended. A v1.1 follow-up ships an in-tree minimal proxy that consumesallowed_domainsnatively.
macOS Seatbelt supports literal (remote tcp "host:port") allow
rules and is correspondingly stricter.
Filesystem ACLs are explicit allow + deny + masks
Bubblewrap masks denied paths with --tmpfs (an empty in-memory
directory shadows the real one). Seatbelt uses
(deny file-write* (subpath …)). Globs aren't supported in the ACL —
operators add explicit roots. ${WORKSPACE}, ${HOME}, and the XDG
vars are expanded at session start.
Detection runs at startup; fail_if_unavailable is the gate
SandboxedShim::new detects the backend, version-checks bwrap
(>= 0.5), and verifies path. When fail_if_unavailable: true and the
backend is missing or too old, caliban refuses to start instead of
running unsandboxed.
enable_weaker_nested_sandbox: true is the escape hatch for
dev containers (already inside a user namespace; --unshare-user
would fail). It drops the offending flags on Linux and is a no-op on
macOS.
Consequences
- Positive: Closes matrix row A "OS-level sandbox" with a
minimally-invasive shim. Reuses the existing PID-group cleanup
logic (the wrapper inherits the child's group). Unlocks the
auto_allow_bash_if_sandboxedUX — Bash becomes a one-keystroke tool when the sandbox is properly configured. Two backends and one config surface means operators move between macOS dev and Linux CI without rewriting policy. - Negative: Seatbelt is deprecated by Apple (no replacement
ship-date). Bubblewrap requires an external binary (
bwrap >= 0.5) that isn't installed by default on every distro. Per-hostname network rules need a proxy to enforce reliably; we don't ship one in v1. Windows isn't supported (deferred). The policy languages are fiddly and undocumented (Seatbelt) or terse (bwrapargv), so debugging operator misconfiguration takes care. - Revisit if: Apple removes
sandbox-exec(move to Endpoint Security Framework backend). A standard hostname-aware sandbox layer emerges (e.g. systemd-resolved per-process filtering). Demand appears for a Windows backend (Job Objects + AppContainer is the v2 path). Container-based sandboxing becomes the prevailing pattern on Linux (revisit with a Podman / Firejail backend option).
ADR 0033 · OpenTelemetry export + cost tracking
- Status: accepted
- Date: 2026-05-24
- Author: john.ford2002@gmail.com
- Spec:
docs/superpowers/specs/2026-05-24-otel-and-cost-design.md
Context
caliban already has tracing instrumentation under
caliban::tools, caliban::cache, caliban::memory, caliban::mcp,
caliban::skills, and caliban::timing. What it lacks: (a) a way to
ship those signals to an OTLP backend, (b) any concept of dollar cost
on completions, (c) operator-visible context-window utilization. Claude
Code ships all three and operators depend on them for billing, capacity
planning, and right-sizing model choices. We need parity.
The Claude Code env-var contract (CLAUDE_CODE_ENABLE_TELEMETRY,
OTEL_*) is well-known and supported by every OTLP backend Anthropic
customers run; rather than invent our own knobs we adopt it verbatim
with CALIBAN_ substitutions only where required.
Decision
One new crate, caliban-telemetry, owns OTLP + cost + context
It pulls opentelemetry, opentelemetry-otlp, tracing-opentelemetry,
serde_yaml, and rust_decimal. caliban-core (agent loop) and
caliban (binary / TUI) depend on it. Other crates do not — they emit
via the existing tracing macros and tracing-opentelemetry bridges
those into OTLP automatically.
Master switch is CALIBAN_ENABLE_TELEMETRY=1
Defaults to 0. When 0, Telemetry::init_from_env returns a no-op
shim in ~10 µs and no exporter is constructed. DISABLE_TELEMETRY=1
and DO_NOT_TRACK=1 both force-disable even when
CALIBAN_ENABLE_TELEMETRY=1 (privacy belt-and-braces).
OTEL_* env vars adopted verbatim from Claude Code
Endpoint, protocol, headers, exporters, intervals, cardinality knobs,
content-control toggles, and mTLS paths — all standard OTel SDK env
names. We do not invent caliban-specific names for things OTel
already standardizes. The only caliban-prefixed extras are
CALIBAN_ENABLE_TELEMETRY (master switch) and CALIBAN_RATES_YAML
(rate-card override path).
Cost is observed, not enforced
CostAccumulator records token usage from every provider response,
multiplies by RateCard-resolved per-1M-token prices, and exposes
totals to /usage plus the caliban.cost.usage metric. Hard caps
(--max-budget-usd) live in headless mode, not here. This ADR is
purely about visibility; budget enforcement is a downstream concern
that consumes the same CostAccumulator.
Rate cards are vendored YAML, updated in lockstep with releases
crates/caliban-telemetry/rates.yaml ships with known rates for
Anthropic, OpenAI, Google, Bedrock, Vertex, and Ollama (the last being
a $0.00 row for completeness). Unknown (provider, model) pairs
match no entry, cost $0.00, and emit a single debounced warning per
session. Operators can override via CALIBAN_RATES_YAML=/path. We do
not fetch rate cards from any third-party API at runtime — the
dependency is one PR-with-a-cron-reminder, not a network call.
USD math uses rust_decimal, never f64
Financial accumulation drifts under f64. We compute in Decimal and
convert to f64 only at the OTLP emit boundary (the OTel SDK insists).
Context window is independent of telemetry
ContextWindow is part of caliban-telemetry for code-locality
reasons but does not require OTel enabled to work. /usage,
/context, and the status-bar percent indicator function for every
caliban user regardless of CALIBAN_ENABLE_TELEMETRY. Only OTLP
emission is gated.
/compact reuses existing summarization, just adds a slash + metric
RequestPurpose::Summarization already wires through
caliban-model-router to a summary-tuned model. The slash command
enqueues that purpose at the head of the loop and emits a compact.event
log. No new model routing logic is introduced by this ADR.
otel_headers_helper is a per-startup helper script + refresh
Settings field [telemetry].otel_headers_helper points at a path;
caliban spawns it at startup and on a configurable interval
(telemetry.otel_headers_refresh, default 5m), parses stdout as
k=v\n…, merges with OTEL_EXPORTER_OTLP_HEADERS (helper wins on
collision). This is how operators put short-lived bearer tokens in
front of their collector without checking secrets into env files.
Consequences
- Positive: Closes six 🔴 rows in the parity matrix under
K. Observability / cost (
/context,/usage,/compact, Cost tracking, OTLP export, Metric set) in one initiative. Reuses the industry-standardOTEL_*env contract so any existing OTLP backend (Honeycomb, Grafana, Datadog, Tempo, Loki) works out-of-the-box. Decoupling cost/context from OTel emission means the operator-visible features (/usage, status-bar percent) work for everyone — including the airgapped offline case. - Negative: Adds ~5 transitive deps via
opentelemetry-otlp(tonic, h2, prost, etc.). Vendored rate cards need monthly refresh discipline.rust_decimalis yet another money library; we'll need a brief style note on when to use it. Content-logging knobs are a privacy footgun if operators misconfigure their collector — README must call this out prominently. - Revisit if: OTel SDK ships a stable currency / cost convention
(currently absent), in which case we align metric attribute names.
If
rust_decimalproves overkill for the precision we need, swap to fixed-point i64 cents. If operators clamor for runtime rate-card fetching (e.g. integration with their FinOps platform), add aRateCardSource::Urlvariant.
ADR 0034 · Bedrock + Vertex providers
- Status: accepted
- Date: 2026-05-24
- Author: john.ford2002@gmail.com
- Spec:
docs/superpowers/specs/2026-05-24-bedrock-vertex-providers-design.md
Context
caliban-provider-anthropic already contains feature-gated
BedrockTransport and VertexTransport implementations (bedrock and
vertex Cargo features), plus the workspace already declares
aws-config, aws-sdk-bedrockruntime, aws-smithy-types, and
gcp_auth as dependencies in anticipation of this work. What's
missing is the top-level Provider-implementing crates that expose
these transports as first-class providers with their own name(),
their own list_models (which require control-plane APIs the
Anthropic crate has no business knowing about), and their own auth
refresh policy. Parity with Claude Code's --bedrock / --vertex
flags requires both crates.
Decision
Two new crates, both thin wrappers around the existing transports
caliban-provider-bedrock and caliban-provider-vertex each contain
~300 lines of glue:
- A
Provider-implementing struct wrappingAnthropicProvider<BedrockTransport>orAnthropicProvider<VertexTransport>. - A
*Configstruct +from_env/from_configconstructors. - An
AuthRefreshbackground task. - A
list_modelsthat hits the relevant control-plane API (bedrock:ListInferenceProfiles/publishers/anthropic/models), caches the result for the session, and falls back to a vendored list on failure. - A
name()returning"bedrock"/"vertex"so the model router and telemetry attribute these correctly.
We do not extend caliban-provider-anthropic to expose Bedrock /
Vertex as alternate constructors because (a) it would force the
Anthropic crate to depend on aws-sdk-bedrock (control plane) and
gain its own non-trivial auth code, and (b) operators have a real
mental-model expectation that provider = "bedrock" and
provider = "anthropic" are separate provider entries.
Auth refresh is a per-provider tokio task with a 5-minute default
Both crates spawn one background task on construction that calls
provider.get_token() (via aws-config's ProvideCredentials or
gcp_auth's TokenProvider) on a configurable interval. Settings
fields aws_auth_refresh and gcp_auth_refresh (and env
CALIBAN_AWS_AUTH_REFRESH / CALIBAN_GCP_AUTH_REFRESH) control the
interval; default 5m; 0 disables proactive refresh and relies on
inline 401 recovery only. Refresh failures back off exponentially up
to the configured interval and surface as tracing::warn! until they
succeed; the cached token continues to be served until it expires.
Model-id canonicalization stays in caliban-provider-anthropic
Transport::wire_model_id already lives in the Anthropic crate. The
new provider crates expose a small per-base-model release-date table
(e.g. ("claude-opus-4-7", "20260423")) consumed by the transport's
wire_model_id. The caliban canonical model name (claude-opus-4-7)
remains the same across Anthropic / Bedrock / Vertex — only the wire
form differs.
Capabilities mirror direct Anthropic per base model
The hyperscalers serve the same Anthropic models with the same context
windows, vision support, and tool-use semantics. Until a real
discrepancy emerges (e.g. some regions lacking prompt caching), both
crates' capabilities() strip the platform suffix and delegate to
caliban_provider_anthropic::models::capabilities_for. Any future
regional / platform restriction is added as a small subtraction layer
on top — not by forking the capabilities table.
list_models is on-demand + per-session-cached, with fallback
We resist the temptation to call list_inference_profiles at provider
startup because (a) startup latency is precious and (b) operators with
read-restricted IAM principals shouldn't fail startup just because
they can't introspect. Both crates call the control-plane API the
first time list_models is invoked, cache the result in a
tokio::sync::OnceCell, and fall back to a vendored list of
well-known models if the API call fails.
Request metadata flows through unchanged
RequestMetadata.purpose, user_id, and any future fields pass
through both crates untouched into the transport into the wire body.
The provider crates own auth + endpoint + list_models — not request
shape.
Consequences
- Positive: Closes two 🔴 rows under I. Model router & providers
(
Bedrock,Vertex). Enables operators in regulated industries (financial services, healthcare, gov) to use caliban with their contractual cloud provider. Composes cleanly withcaliban-model-routerso the same operator can route Sonnet via Bedrock for compliance and Haiku via direct Anthropic for cost. Reuses the Anthropic IR adapter so the message-shape correctness surface stays single-sourced. - Negative: Adds two new crates to the workspace; the
aws-*dependency tree is heavy (~30 transitive crates, mostly hyper/tower stack). Bedrock model-id rotation (Anthropic occasionally re-dates Bedrock models without changing direct-API names) requires per-base-model date-table maintenance. Two new mock-based test surfaces to maintain. - Revisit if: AWS or GCP changes the canonical wire format
significantly (e.g. Bedrock unifies under inference-profile ARNs
exclusively), in which case the canonical→wire mapping simplifies.
If
caliban-provider-anthropic's embeddedbedrock/vertexfeatures turn out to be confusing duplicate paths, deprecate those feature flags in favor of the new crates and route all hyperscaler-served Anthropic through here.
ADR 0035 · Auto-memory (model-written notes)
- Status: accepted
- Date: 2026-05-24
- Author: john.ford2002@gmail.com
- Spec:
docs/superpowers/specs/2026-05-24-auto-memory-design.md
Context
caliban-memory's third tier (the auto tier, XML tag auto-memory-index)
currently bootstraps an empty MEMORY.md with a conventions block and
splices it into the prompt — but no machinery exists for writing
memory back. Claude Code's auto-memory feature is operator-visible
gold: the model accumulates per-project user/feedback/project/reference
facts across sessions, and re-loads them as part of the system prompt
each turn. Closing this gap is one of the highest-leverage rows in the
parity matrix because every other feature (skills, slash commands,
hook handlers) gets compounded by long-running memory.
The on-disk layout the user already maintains under
~/.claude/projects/<sanitized-cwd>/memory/ is well-defined: an index
MEMORY.md + one markdown file per topic, each with YAML frontmatter
declaring name / description / metadata.type. We adopt it
verbatim under ~/.caliban/projects/<sanitized-cwd>/memory/.
Decision
Three artifacts together implement auto-memory
- Loader extension in
caliban-memory— readsMEMORY.md(first 200 lines / 25 KB), strips HTML comments, splices it into the prompt under<auto-memory path="…" topic_count="N">…</auto-memory>. TopicLoaderincaliban-memory::auto— lists / reads / writes / deletes topic files (sibling.mdofMEMORY.md); does atomic write + index-line update in a single call so the model can't half- commit.- Built-in
auto-memoryskill bundled incaliban-skills— its body is the protocol manual (when to read, when to write, the four types, anti-examples). Withdisable_model_invocation: false, the skill is always available + always loaded into the system prompt.
Two new built-in tools, ReadMemoryTopic and WriteMemoryTopic
We do not reuse Read/Write for memory access because (a) memory
paths are sandboxed to the memory dir (path-traversal guard) and (b)
writes need to atomically update both the topic file and the index
line — that's a single tool call, not two Writes. Both tools live in
caliban-tools-builtin under a new memory.* permission category
(allowed by default).
MEMORY.md is splice-only; topic files are on-demand
The index is small enough to splice every turn (200 lines / 25 KB cap).
Topic files can be hundreds of KB collectively; they're pulled by slug
on demand via ReadMemoryTopic. [[slug]] cross-references between
topics are informational breadcrumbs — the loader does not
auto-follow them.
HTML-comment stripping is done at splice time
<!-- --> blocks in MEMORY.md are stripped from the spliced prompt
(but stay on disk). This lets us keep the auto-injected
CONVENTIONS_BLOCK HTML-comment-fenced so it doesn't fight with
operator-authored content. The strip is greedy (regex), which means a
fenced code block containing <!-- --> will lose the comment in the
spliced view — documented limitation, low-impact.
Four memory types, model decides at write time
user / feedback / project / reference. The skill body
documents heuristics + anti-examples; the model classifies inline.
We deliberately avoid a typed classifier service — the model is in
the best position to judge what to save, and we don't want a hidden
ML layer between the user's intent and the on-disk artifact.
No automatic pruning
Memories persist until manually removed. /memory rm <slug> and
/memory rebuild-index cover the manual-curation path. Automatic
forgetting is a research problem that we explicitly punt on.
CALIBAN_DISABLE_AUTO_MEMORY=1 is both a privacy kill switch
and a determinism switch for CI
When set, no <auto-memory> block is spliced and the auto-memory
skill is dropped from the system prompt. This guarantees that headless
runs and CI workflows produce identical prompts regardless of
on-disk memory state.
The on-disk format is the source of truth
We do not invent a database or sqlite layer. Markdown + YAML
frontmatter is human-readable, git-friendly, and aligns with how
operators already mentally model CLAUDE.md. The trade-off — file
locking concurrency, parsing overhead — is acceptable at the scales
auto-memory actually sees (tens of topic files, kilobytes each).
Atomic writes via tempfile + rename
WriteMemoryTopic writes to <slug>.md.tmp then renames; index-line
update is part of the same operation. Failure mid-write leaves the
prior content intact. Failure between topic-write and index-update
leaves an orphan topic file — rebuild-index repairs it.
Consequences
- Positive: Closes a tier-5-priority row in the parity matrix that compounds the value of every long-running session. Operators get Claude Code's "wow it remembered" UX out of the box. The on-disk format means operators can manually curate memory with their favorite text editor. Composes with skills (the protocol is a skill) so the system documents itself.
- Negative: Two new built-in tools to maintain + a new permission category. The auto-memory skill body is a maintenance surface (15 CI test asserts it doesn't drift). HTML-comment stripping is a hidden behavior that may surprise operators. No automatic pruning means MEMORY.md grows unbounded on long-running projects — operator hygiene is required.
- Revisit if: The 200-line / 25 KB cap turns out to be too small
in practice (operators routinely brush against the truncation
warning); a richer indexer that summarizes topic files into the
splice may be needed. If concurrent writes from background subagents
prove racy, add file locks (
fs2::FileExt::try_lock_exclusive). If the markdown+frontmatter parsing overhead shows up in startup profiles, add a per-topic cache keyed bymtime.
ADR 0036 · CLAUDE.md ancestor walk + @-imports
- Status: accepted
- Date: 2026-05-24
- Author: john.ford2002@gmail.com
- Spec:
docs/superpowers/specs/2026-05-24-claudemd-ancestry-design.md
Context
caliban-memory's project tier currently loads exactly one file —
<workspace_root>/CLAUDE.md. Claude Code instead walks from cwd
upward, concatenating every CLAUDE.md (and AGENTS.md and
.caliban.md) it finds, supports @path/to/file imports inside any
of them (bounded recursion + approval for external paths), loads
nested children on demand as the model reads into subdirectories, and
honors .claude/rules/<topic>.md files with optional paths: glob
frontmatter for scoped activation. The matrix marks this row 🟡
because the single-file loader exists but lacks every other behavior.
We need parity to make caliban usable in monorepos, in deeply-nested project layouts, and in any workflow where contributors share CLAUDE.md fragments via imports.
Decision
Five behaviors, one orchestrator
The new project tier in caliban-memory orchestrates five distinct
concerns:
- Ancestor walk — start at cwd, walk up to git root (or fs root,
configurable via
WalkStop), concatenate every CLAUDE.md / AGENTS.md /.caliban.mdin broad → narrow order. @-imports — recursion-bounded (depth ≤5), cycle-detected by canonical path, with an approval dialog for first-time external imports persisted to~/.caliban/imports-allowlist.json.- Nested-on-demand —
Read/Edit/Globsuccess notifies anAncestryAddendumwhich appends any newly-touched directory's CLAUDE.md to the system prompt for the rest of the session. .caliban/rules/<topic>.md— path-glob-scoped rules with aRulesActivatorthat lights them up on first matching path touch.claude_md_excludes— gitignore-style patterns scoped to the workspace root, evaluated during walk.
All five share the existing MemoryPrefix machinery; project slot
becomes a richer ProjectTier struct containing four Vec<TierFile>
collections (base / imports / rules / nested) instead of one
TierFile.
Three filenames, no precedence battles
CLAUDE.md, AGENTS.md, and .caliban.md are all loaded when present
in the same directory. Within a directory we load
.caliban.md → CLAUDE.md → AGENTS.md (most-specific → most-general).
We do not surface "which file overrode which" because they don't
override — they concatenate. Operators who need exclusion use
claude_md_excludes.
@-import semantics align with Claude Code, minus HTTP
Local paths only. @./foo.md, @~/notes/x.md, @/abs/path.md all
work; @http(s)://… is rejected outright. This keeps imports
auditable (a static set of filesystem paths) and avoids embedding an
HTTP fetcher inside the prompt-assembly path.
External imports (those outside the workspace root and outside
~/.config/caliban/) require approval. The dialog persists decisions;
non-interactive callers (--print, CI, --bare) deny by default but
respect CALIBAN_APPROVE_IMPORTS=1 for unattended runs.
Nested-on-demand is one-shot per (path, session)
Once the model Reads a file and we load that directory's CLAUDE.md,
we keep it for the rest of the session. We do not detect file changes
and reload, we do not unload when the model leaves the subtree. This
keeps the system prompt monotone (only grows), which matches how
operators reason about it.
Rules use globset, the workspace's existing glob crate
globset is already a workspace dep. Rules build a single
GlobSet at startup; path-touch hooks ask "does this path match any
unactivated rule?" — O(1). Rules without a paths: frontmatter are
always-active (loaded at startup, before any path touch).
claude_md_excludes is gitignore-style with explicit semantics
We adopt the gitignore matching semantics (! negation, last-match
wins for a given path). Patterns are evaluated relative to the
workspace root, not to the absolute filesystem path — operators
write node_modules/**, not /Users/foo/proj/node_modules/**. The
workspace root is the start of the ancestor walk (the cwd at startup).
--add-dir paths contribute CLAUDE.md only opt-in
Adding a directory to the agent's accessible-paths set should not
silently inject another CLAUDE.md into the prompt. Operators who want
that behavior set CALIBAN_ADDITIONAL_DIRECTORIES_CLAUDE_MD=1. Each
--add-dir then performs its own ancestor walk, concatenated after
the cwd walk in declaration order.
Regression escape: CALIBAN_DISABLE_CLAUDE_MD_WALK=1
If the new loader misbehaves in a real-world repo we don't have CI coverage for, operators set this env to fall back to the legacy single-file project tier. This is a maintenance lifeline; we expect it to be unused in steady state.
Consequences
- Positive: Closes three 🟡 / 🔴 rows under C. Memory &
checkpointing in one PR. Caliban becomes deployable in monorepos
without prompt-injection workarounds.
@-imports unlock content sharing between repos (a single~/notes/api-conventions.mdcan be imported from every project's CLAUDE.md). Rules let language/framework-specific guidance be scoped to where it applies instead of polluting the top-level CLAUDE.md. - Negative: Project-tier complexity goes up materially — five
concerns sharing one orchestrator. The approval-dialog UX adds a new
modal flow the TUI must handle. The system prompt grows
monotonically during a session, which interacts with the existing
memory budget enforcement (truncation logic now runs against a
larger surface). Operator authoring of
claude_md_excludesgitignore patterns is a known footgun (test #18 covers the common case). - Revisit if: A real-world repo demands HTTP imports — we'd
revisit the security model (signed manifests? lockfile?). If the
approval dialog frequency proves annoying in practice, add
[memory] auto_approve_under = ["~/dev/personal/**"]. If the monotone-prompt-growth interacts badly with long sessions, add a rule-level "deactivate after N turns since last match" knob.
ADR 0037 · Sub-agent worktree isolation + background fleet
- Status: accepted
- Date: 2026-05-24
- Spec:
docs/superpowers/specs/2026-05-24-subagent-worktree-and-fleet-design.md - Builds on: ADR 0021 (sub-agent primitive), ADR 0024 (hook taxonomy)
- Author: john.ford2002@gmail.com
Context
ADR 0021 shipped AgentTool as an in-process, foreground, recursion-
guarded primitive. That covers the simple "spawn a read-only Grep/Read
subagent and inline its summary" use case Claude Code uses for parallel
research. It does not cover:
- Filesystem isolation — a sub-agent that writes files shares the parent's working tree, so Edit/Write side-effects mix into the parent's diff and there is no clean way to discard them.
- Long-running detached work — the parent's turn budget is the
sub-agent's wall-clock budget; nothing survives the parent run ending.
Claude Code's
--bg,claude agents list / attach / respawn / rmsurface a fleet of detachable sub-agents we have no equivalent for. - Hook inheritance — deferred from PR #9 / ADR 0024. Child sub-agents currently get a brand-new hook stack; flow-scoped hooks the parent set up are silently dropped.
These three concerns share state (the spawn site, the lifecycle ownership, the working-directory model) and want to be solved together. This ADR records the architectural commitments; mechanics live in the design spec.
Decision
Isolation is opt-in per sub-agent, via frontmatter or call-site
Two modes only — none (today's behavior, default) and worktree. A
worktree sub-agent runs in a dedicated git worktree materialized under
.caliban/worktrees/<name> with a configurable base_ref (fresh /
head / named ref), optional sparse_paths, and optional
symlink_directories (so heavy build outputs like target/ and
node_modules/ are shared by symlink instead of duplicated).
We pick git-worktree over copy-on-write filesystems or chroots because it works everywhere git works, it is a primitive the user already understands, and it composes with the rest of git (a sub-agent's diff is a real branch tip the user can inspect). Containers and OS sandboxes are orthogonal layers that can wrap a worktree later.
Background sub-agents are owned by a new caliban-supervisor daemon
bg = true (frontmatter or runtime override) detaches the sub-agent from
its caller. The detached agent's lifecycle is managed by a per-repo
daemon (caliband) auto-spawned on first need. The daemon owns a control
Unix socket (list/attach/kill/respawn/rm/spawn/status) and exposes a
per-agent socket each sub-agent writes its TurnEvent stream to.
We pick a separate daemon process — not a tokio task inside the main CLI — because (a) the parent CLI process should be free to exit and let background sub-agents keep running, and (b) it cleanly separates short-lived foreground concerns from long-lived fleet concerns. We pick a Unix domain socket over TCP because the fleet is local-only by design; TCP exposure waits for a remote-orchestration ADR.
Per-agent on-disk store is caliban-sessions-compatible
A background sub-agent's <base>/agents/<id>/session.json is a regular
caliban session file. caliban agents attach <id> is sugar for
caliban resume <id> over the agent's socket. Reusing the format means
session tooling (compaction, replay, audit) works on background sub-
agents for free.
Ctrl+B is a runtime transition, not a new spawn
A foreground sub-agent can be backgrounded mid-run by snapshotting its
state and transferring ownership to the supervisor. The parent's
in-flight AgentTool::invoke future is cancelled with a
ToolError::Backgrounded(id) and the assistant transcript records the
handoff. The sub-agent itself sees no state change — it continues from
the next event. This is the operator's escape hatch for "this is taking
longer than I thought; let me get my main loop back."
Hook inheritance defaults to true, with an explicit opt-out
Closes the deferred follow-up from ADR 0024 PR #9. Children inherit
the parent's Hooks chain by default; inherit_hooks: false in
frontmatter resets to the binary's default chain. For background sub-
agents, only the serializable portion of the parent chain
(HookRouter config + identified in-process hooks) crosses the process
boundary; opaque closures are stripped with a loud warning. This trades
some correctness for a tractable contract — operators who want full
inheritance keep their background sub-agents foreground until their
hooks are config-expressible.
Worktree cleanup defaults to true
Foreground worktrees are removed when the sub-agent's WorktreeHandle
drops. CALIBAN_KEEP_WORKTREES=1 (and per-call keep_on_exit: true)
disable removal for debugging. Background worktrees are owned by the
supervisor and removed on caliban agents rm <id> (and on daemon
startup, for orphans, when configured). This is deliberately aggressive:
worktrees are cheap to recreate and expensive to leak.
Consequences
- Positive. Closes four 🔴 rows under matrix G — worktree isolation,
background sub-agents, subagent-local memory dir, hook inheritance —
and adds the supervisor daemon row as a new ✅. Unblocks the
"long-running code-review subagent" and "parallel exploratory
refactor" workflows that Claude Code uses heavily. Establishes the
daemon substrate other features can borrow (notably a future
caliban serveHTTP shim for headless use). - Negative. Two new crates and a new binary (
caliband). The per-repo daemon model means cross-repo agent management requires multiple daemons; we accept this for v1. Hook inheritance for background sub-agents is partial by design (closure hooks dropped). Disk usage grows with sparse + symlink-shared worktrees, but the default fresh-empty base_ref keeps the floor low. Windows symlink requirements (elevation / dev mode) make worktree isolation a best-effort feature there. - Revisit if: Disk pressure from worktrees becomes a recurring
operator complaint — promote a "shared object store" layout
(
git worktree --no-checkout+ targeted materialization). If background-agent IPC outgrows length-prefixed bincode, swap to gRPC over the same socket. If the no-closure-hook-inheritance compromise for background mode bites real users, sketch a serializable-hook IR.
ADR 0038 · Model router v2 — fallback, hedging, breakers, capabilities, binary wiring
- Status: accepted
- Date: 2026-05-24
- Spec:
docs/superpowers/specs/2026-05-24-model-router-v2-design.md - Supersedes scope of: ADR 0022 deferred items
- Author: john.ford2002@gmail.com
Context
ADR 0022 + PR #12 shipped the model router as a config-driven dispatcher:
TOML schema, builder API, purpose-keyed routes, impl Provider, per-
route usage tracking. Five capabilities were deferred to v2 because the
v1 surface needed to settle before resilience landed on top:
- Fallback chains — try the next route on a fatal-for-route error.
- Hedged requests — race a second route after a delay; first wins.
- Circuit breakers — skip a failing route for a cool-off window.
- Capability-based pre-routing — auto-route requests that need vision/thinking/parallel-tools to capable models even if the operator put a non-capable route first.
caliban.tomldiscovery + binary wiring — the CLI does not yet construct aModelRouterfrom[router]; it falls back to single- provider construction.
Plus the smaller effort and per-route prompt-cache normalization
follow-ups. Closing all six in one ADR keeps the router's contract
coherent — fallback, hedging, and breakers all consume the same
candidate-vec from resolution, and capability filtering changes which
candidates appear in the first place.
Decision
Resolution and dispatch are separated, with a candidate vec as the seam
resolve_candidates(...) -> Vec<&RouteEntry> is the single funnel into
the dispatch driver. Filters apply in order: purpose → declared
requires → request-derived needs → breaker state → explicit fallback
re-ordering. Dispatch (fallback or hedging) consumes the vec
identically. This means the same diagnostic (/router debug) shows the
exact list every dispatch will see, and new filters (e.g. cost-budget)
slot in without touching dispatch.
Fallback is sequential by default; hedging is opt-in per route
Sequential fallback handles the cost-conscious common case: try the
primary, only spend on the secondary on real failure. Hedging is a
spend-for-latency knob the operator opts into per route via
hedge = { hedge_after_ms = N, max = K }. We pick this default because
hedging silently doubles the bill for the median request; making it opt-
in keeps the surprise floor low.
Fatal-for-route is a closed list
ModelUnavailable, RateLimit (post adapter-retry), ContextTooLong,
ServerError, NetworkTimeout → fall back. Everything else
(Auth, InvalidRequest, ContentPolicy, Cancelled) propagates.
The list lives in code (fallback.rs::is_fatal_for_route); tests pin
the membership.
Circuit breaker is per-route id, not per (provider, model)
The breaker's state lives in BreakerRegistry: HashMap<RouteId, ArcSwap<BreakerState>>. We key on the route id (which defaults to
{provider}:{model}:{purpose}) so the operator can break a provider on
one purpose without disabling it on another. Closed → Tripped → HalfOpen → Closed/Tripped is the standard SRE breaker. Cancelled
outcomes do not count toward failure.
Capability filtering is pre-routing, not post-failure
Today the router relies on requires blocks to drop incompatible
routes. v2 adds request-derived needs (image content → vision; thinking
budget → thinking capability) so the operator does not need to mark
every route explicitly. This costs one Provider::capabilities(model)
call per candidate (already a HashMap lookup in the adapters); we accept
the cost because the diagnostic value is large.
caliban.toml discovery uses the CLAUDE.md walk algorithm
Same ancestor-walk-up-to-git-root-or-$HOME as memory tier 0018, with a
different filename predicate. Both walks share a caliban-memory::walk_up
utility (already small, factored out for this ADR). Layering: CLI flag >
env var > caliban.toml > $HOME/.config/caliban/caliban.toml. Unknown
providers fail loudly at startup, not lazily on first call.
Effort levels live on RequestMetadata and map per-adapter
RequestMetadata.effort: Option<EffortLevel> is plumbed through to each
adapter. Each adapter owns the mapping to its native effort knob
(reasoning_effort / extended_thinking.budget / thinkingConfig).
Ollama's mapping is a no-op for now. Operators see the table via
caliban router debug --effort-table.
Prompt-cache markers are cleared on cross-route hops
When fallback or hedging moves to a different provider mid-session,
cache_control markers in the persisted messages are stripped before
the new adapter sees them. The cleared count is recorded in
router.cache.markers_cleared. This is the cheap, safe behavior;
markers are normalization-cost, not correctness-cost.
Metrics are tracing first, OTel-export later
We emit tracing events with structured fields (route_id, purpose,
kind, from, to); the OTel cost spec (out of scope for this ADR)
maps them to OTLP metric streams. Keeping the in-router emission
tracing-only avoids pulling opentelemetry into a Layer-3 crate.
Consequences
- Positive. Closes six 🔴 rows under matrix I in one PR — fallback,
hedging, breakers, capability filtering,
caliban.tomlwiring, effort levels. The router now earns its keep as a resilience layer: a flaky primary auto-routes to a secondary, a tripped breaker prevents cascade failure, hedging gives operators an explicit spend/latency knob. The binary actually constructs a router from config, removing the awkward "config exists but unwired" state. - Negative. Hedging spend can surprise operators who do not read
the README. We mitigate with explicit-opt-in and loud per-route
hedge_lossmetrics, but it remains a footgun. Breaker false positives are real and the cool-off window is fixed (no exponential back-off in v2). Capability auto-routing changes which route a request lands on without the operator'spurposeknob; this can be debugged via/router debugbut is a behavior change v1 users may not expect — release notes call it out. Prompt-cache marker clearing means cross-route hops lose Anthropic cache savings; with hedging this happens silently on every hedge to a non-Anthropic fallback. - Revisit if: Operator demand for adaptive hedge tuning (EWMA, p95
observation) materializes — a v3 sketch already lives in the spec's
non-goals. If breaker false-positive complaints recur, add
exponential cool-off (cooldown_secs * 2^trip_count up to a cap). If
the candidate-vec seam ossifies and we need cost/budget routing,
introduce a
Budgetfilter stage before dispatch rather than rewriting dispatch.
ADR 0039 · Image + vision input
- Status: accepted
- Date: 2026-05-24
- Spec:
docs/superpowers/specs/2026-05-24-image-input-design.md - Author: john.ford2002@gmail.com
Context
Vision is table-stakes for any modern coding assistant: users want to
paste a screenshot of a stack trace, drop a Figma export, or @path a
generated chart and get it through to a vision-capable model. Caliban's
current ContentBlock IR is text-only; the TUI input layer has no paste
handler beyond text; the provider adapters serialize text content
exclusively. Closing this requires changes across five crates plus a
new ingest crate, but each change is small and the design carries no
provider lock-in. Capability filtering (model-router v2) already
contemplates a vision predicate — this ADR makes it real.
Decision
ContentBlock IR gains an Image variant; ImageBlock is provider-agnostic
The IR carries { source, mime, sha256, dims, cache_control }. source
is Base64 { data } or Url { url }. Provider adapters own the
serialization to their native shape (Anthropic image, OpenAI
image_url, Google inline_data). This keeps the IR free of provider-
specific knobs and lets us add a new provider's image shape by writing
exactly the adapter, with no IR churn.
Ingest is a separate crate, caliban-images
Clipboard reads, DnD escape parsing, MIME sniffing, decode validation,
size cap enforcement, downscale, and SHA-256 fingerprinting all live in
one crate. The TUI and CLI both depend on it; the model-router pulls
nothing from it (router only sees the already-built ImageBlock IR).
Crate separation matters because the image decoder family is the
biggest CVE surface in the dependency tree — keeping it behind a single
boundary makes audits and feature-gating tractable.
MIME allowlist is closed: png, jpeg, gif, webp
We explicitly disable bmp/tiff/dds/tga and friends in the image crate
feature flags. AVIF/HEIC are tracked but not v1. The list mirrors what
all three vision providers (Anthropic, OpenAI, Google) support; expanding
later is a config-flag change, not an API change.
Default size cap: 5 MiB pre-base64, downscale to 1568 px on longest edge
5 MiB matches Anthropic's documented limit; 1568 px matches Anthropic's
recommended longest edge for cost-efficient inputs. Over-cap images are
downscaled (Lanczos3) with a WARN-level trace and a "[downscaled]"
badge on the TUI thumbnail. Operators can override via [images] in
caliban.toml.
Capability filtering is mandatory; CALIBAN_STRICT_ROUTING=false opts out
By default, an image-bearing request that has no vision-capable route
fails with RouterError::NoCandidate. Operators who want degraded
behavior (CI, headless) set CALIBAN_STRICT_ROUTING=false; the router
replaces image content with a documented text placeholder and continues.
We pick "strict by default" because silent vision drop is a worse
failure mode than a clear error pointing at the missing route.
Sessions store images as blob refs, never as inline base64
session.json carries ImageSource::BlobRef { sha256 }; the actual
bytes live in <session>/blobs/<sha>.bin. BlobRef has #[serde(skip_…)]
guarding against accidental wire serialization. This keeps session
files small, makes git history of .caliban/sessions readable, and
sets up the future session gc command.
TUI graphics protocol is detected once per session, with a text fallback
We probe kitty/sixel/iTerm2 capability via short escape sequences with
a 100ms timeout, cache the result, and fall through to a
[image: WxH MIME filename] placeholder otherwise. Probe results are
overridable via CALIBAN_GRAPHICS=kitty|sixel|iterm|none. Probes hang on
some terminals; the timeout is the safety valve.
Cost accounting reads provider Usage, not local estimates, but estimates surface in /usage for diagnostics
Anthropic and OpenAI return token usage including image tokens. We bill
from what providers report. Locally we also compute a labelled
estimate (ceil(w * h / 750) for Anthropic-style billing) so the
/usage overlay can answer "what's this image roughly worth"
before the call returns.
Consequences
- Positive. Closes matrix E "image / vision input" with one PR.
The IR change is a small additive variant on
ContentBlock; existing handlers are unaffected (default-match arm). Capability filtering in the router (already designed in v2) gets its first real consumer. Pasting a screenshot into caliban "just works" with the right route configured. Thecaliban-imagescrate establishes a pattern for future media types (PDF, audio). - Negative. Five crates touched.
imagecrate dependency adds decoder CVE surface; we constrain it but cannot eliminate it. The TUI gains a graphics-protocol detection path that has been a recurring source of bugs in other tools — we mitigate with caching + override but accept some carry. Cost surprise is real for large screenshots; the 1568 px downscale default helps but does not eliminate it. Strict-by-default routing will trip operators who configured a non- vision route as their default — clear error message + docs are the mitigation. Session blob storage adds a directory layout we must GC eventually. - Revisit if: Output-side vision (image generation) becomes a real
capability across providers — extend the IR with
ImageGeneration/ similar. If theimagecrate accumulates serious CVEs, sandbox the ingest path in a separate process. If users routinely hit the per-message count cap (20), expose it directly in the TUI rather than viacaliban.toml.
ADR 0040 · Slash command registry
- Status: accepted
- Date: 2026-05-24
- Spec:
docs/superpowers/specs/2026-05-24-slash-command-coverage-design.md
Context
caliban currently has four hard-coded slash commands (/plan,
/memory, /skills, /quit) dispatched from a match in
Tui::handle_slash_command. Closing the parity gap with Claude Code
adds another ~24 commands at minimum, plus plugin-supplied commands.
Continuing the match arm pattern is untenable: it forces every command
into one file, prevents plugins from registering commands, and
duplicates the typeahead suggester data.
Decision
A SlashCommand trait + central SlashCommandRegistry
Each slash command becomes its own impl SlashCommand in
caliban/src/tui/slash/<group>.rs. The registry holds them by name in
a HashMap<&'static str, Arc<dyn SlashCommand>> and exposes
register, suggest, dispatch. The TUI's input bar consults the
suggester for typeahead; the dispatcher routes execution.
A shared SlashCtx<'a> is passed to every command
Commands need mutable access to the running session and immutable
references to long-lived registries (providers, router, MCP manager,
skills, hooks, sub-agent fleet, settings). Threading each separately
into every command would mean re-plumbing nine call sites every time a
new shared resource is added. Instead, SlashCtx is a single
borrowing struct constructed per command dispatch. Commands take
&mut SlashCtx<'_> and reach in for what they need.
The risk is SlashCtx becoming a god-object. We accept that risk and
commit to splitting it if it grows past ~20 fields.
Slash commands are operator UI, not model tools — no permission gating
Slash commands run as the operator's direct action; they are not gated
by the permission rule grammar that protects model-initiated tool
calls. Commands that wrap destructive operations (/clear,
/rewind-restore, /logout) implement their own interactive
confirmation in their overlay. This keeps the rule grammar focused on
its actual job (constraining the model) and removes a layer of
ambiguity ("did /clear get rejected by a Bash rule?").
Hooks fire on slash submission
UserPromptSubmit (from ADR 0024 / Hooks expansion) fires before the
slash parser runs. Hook payload includes is_slash: bool, command: str, args: str. A hook can reject or modify the slash command —
useful for audit logging or per-operator policy.
Stubs are first-class
Several slash commands depend on machinery being designed in sibling
specs (settings, MCP v2, plugins, OTel/cost, checkpointing). Rather
than wait for everything to land, we register stubs that emit a
helpful status message ("cost tracking lands in PR #N — see
docs/superpowers/specs/2026-05-24-otel-and-cost-design.md"). The stub
files name the in-flight spec so the user can tell what's coming.
Consequences
- Positive: Clean extension point — adding a command is one file
and one
registry.register(...)line. Plugins (per ADR 0030) register commands the same way. Typeahead works automatically for every registered command./helpenumerates the live set, so documentation never drifts from reality. - Negative:
SlashCtxis wide. Stubs can confuse operators if the message isn't clear. Plugin-supplied commands shadowing built-ins need consistent semantics (plugin loses); logged at registration time. Adds ~150 LOC of trait/registry plumbing for ~24 small command impls. - Revisit if: Commands begin to need session-specific command registration (e.g. a sub-agent's command appears only when that sub-agent is attached). Today's registry is process-global; a per-session overlay can be added without breaking the trait.
ADR 0041 · TUI redraw tick close-out
- Status: accepted
- Date: 2026-05-26
- Supersedes: portions of 0014 (the "If the underlying cause is something deeper" open question)
Context
ADR 0014 introduced a 50 ms redraw tick into the TUI event loop
(caliban/src/tui.rs:180) as a workaround for stalls observed during
streaming completions. The same ADR explicitly acknowledged the tick
"masks the symptom rather than addressing the root cause" and pointed
at a probable missing-waker bug in async_stream::try_stream! as the
likely culprit.
Two years on, no follow-up ADR had closed the question — this ADR does.
Decision
The 50 ms redraw tick stays.
The reasoning:
- No reported regressions in 18 months of regular use. The tick has been in place since the original ADR 0014 commit; no stall reports have surfaced since.
- Modern async-stream 0.3 has sound waker propagation. The
original 2024 hypothesis (
async_stream::try_stream!failing to register a waker) is unlikely with the current dep. A static read of the TurnEventStream construction (crates/caliban-agent-core/src/stream/mod.rs:263) found no obvious waker bugs. - The tick's cost is negligible. A no-op wake every 50 ms is ~10 µs of CPU per second = 0.02 % overhead on a single core. The ratatui frame-render path early-returns when state is unchanged (the toast-drop check above the draw call is the only state mutation per tick).
- Removing the tick would risk a silent regression for a marginal cleanup gain. The tick is a one-line defensive fallback that costs nothing observable.
Consequences
- The tick remains in
caliban/src/tui.rs. - ADR 0014's "If the underlying cause is something deeper" open question is now considered closed.
- The mention of the tick in ADR 0014 is left as-is for historical context; this ADR is the authoritative current decision.
Revisit if
- A contributor identifies a reproducible stall under specific conditions (a particular provider, model, or prompt shape).
- A future async-stream / ratatui / tokio upgrade reintroduces the symptom.
- A measurable battery-life or thermal regression is attributed to the redraw tick on long-running TUI sessions.
In any of those cases the appropriate response is to re-run the investigation with the debug log enabled (see ADR 0014 §"Debug log"), identify the root cause, and either land a real fix or write a new ADR with the updated reasoning.
References
- ADR 0014 (original tick decision; §"Stall fix").
- TUI event loop:
caliban/src/tui.rs:180(interval declaration),caliban/src/tui.rs:241(tick arm of the select). - TurnEventStream construction:
crates/caliban-agent-core/src/stream/mod.rs:263(try_stream!macro invocation). - Workspace dep:
async-stream = "0.3"(rootCargo.toml).
ADR 0042 · caliband sibling-binary placement
- Status: accepted
- Date: 2026-05-26
Context
The workspace declares two binaries:
caliban— the primary user-facing TUI/CLI. Source at the workspace root (caliban/src/main.rs).caliband— the supervisor daemon (ADR 0037). Source nested under its owning crate atcrates/caliban-supervisor/src/bin/caliband.rs, declared via the[[bin]]entry incrates/caliban-supervisor/Cargo.toml.
ADR 0005 ("Workspace layout") establishes the convention that
"primary" binaries live at the workspace root. caliband does not —
it lives nested under its owning crate. ADR 0037 introduces the
daemon obliquely (its name, its on-disk paths, and its protocol) but
does not document the placement choice. This ADR records it.
Decision
caliband stays nested under caliban-supervisor as a secondary
binary, with its [[bin]] declaration in the supervisor crate's
Cargo.toml.
Consequences
- Clean process boundary between the user-facing
calibanCLI/TUI and the supervisor daemon. The two never share amainentry point; they communicate over a Unix socket per ADR 0037. - Direct crate access.
calibandconsumescaliban-supervisor's modules directly without going through a public API surface — appropriate because they ship together. - No accidental dispatch. Launching
calibannever accidentally invokescaliband'smain(or vice versa); they're distinct binaries fromcargoand from the user's$PATH. cargo installrequires--bin calibandexplicitly. The supervisor crate's README documents this; thecaliban agentssubcommand spawnscalibandfrom the same install prefix ascaliban(per ADR 0037).- Workspace-root parsimony. The root stays focused on the primary
product (
caliban); the daemon is appropriately filed under the crate that owns its implementation.
Why this differs from ADR 0005
ADR 0005's "binaries at root" rule was written assuming a single binary. With two, the rule needs nuance:
- A binary whose sole purpose is to expose a crate's library functionality as an executable belongs with that crate.
- A binary that integrates many crates into the product surface belongs at the workspace root.
caliban is the latter; caliband is the former. This ADR amends
ADR 0005's rule by adding that nuance.
Revisit if
- A third sibling binary appears (e.g., a
caliban-mcpdaemon for remote MCP servers). At that point the workspace should consider abinaries/subdirectory rather than continuing the case-by-case pattern. calibandoutgrows its current sole consumer (thecaliban agentssubcommand) and starts being launched standalone by other tooling — it might then belong at the root for discoverability.
References
- ADR 0005 (workspace layout — sets the "binaries at root" convention this ADR refines).
- ADR 0037 (subagent isolation + fleet — introduces
caliband). - Source:
crates/caliban-supervisor/src/bin/caliband.rs. - Declaration:
crates/caliban-supervisor/Cargo.toml([[bin]]).
ADR 0043 · arc-swap as the read-mostly shared-state primitive
- Status: accepted
- Date: 2026-05-26
Context
Several read-mostly shared-state surfaces in the workspace use
arc_swap::ArcSwap rather than tokio::sync::RwLock:
caliban-agent-core::permission_mode::SharedPermissionMode—Arc<ArcSwap<PermissionMode>>for the active permission mode (read on every tool call; written when the user toggles via the TUI overlay or a slash command).caliban-model-router::breaker::CircuitBreaker—ArcSwap<BreakerState>for the per-provider breaker state (read on every routed request; written on rolling-window state transitions).caliban-settings::SettingsHandle—Arc<ArcSwap<Settings>>for the live settings snapshot (read by many subsystems; written whenSettingsWatcherfires a reload).
The choice was made per-surface during the parity sweep but never documented at the workspace level until this ADR.
Decision
Prefer arc-swap for shared state when all three apply:
- Readers outnumber writers by ≥ 10×. The replacement cost on each write is justified only when reads dominate.
- Writers can tolerate full
Arcreplacement.arc-swapswaps a wholeArc; partial mutation requires a load-modify-store pattern (cheap but susceptible to lost updates without external coordination). - Read latency is on the hot path. A
tokio::sync::RwLockis already cheap, butarc_swap.load()is measurably cheaper: it's lock-free, allocation-free, and has no contention even with 100s of concurrent readers.
Use tokio::sync::RwLock for surfaces with frequent partial
mutation (e.g., long-lived per-key state where rewriting the whole
Arc would thrash GC), or where writer fairness matters more
than reader throughput.
Use plain std::sync::Mutex for short critical sections that don't
need to await across the lock.
Consequences
- Lock-free reads. Every
load()returns anArc<T>snapshot via a guard with no contention. - No priority inversion under load: readers never block writers, writers never block readers.
- Slightly higher memory churn on writes: each
storeallocates a newArc. Acceptable for the listed surfaces because writes are rare (mode toggle, breaker state transition, settings reload). - No fairness guarantees between concurrent writers. Acceptable
because writers are rare; if two writers race, the later
storewins per the swap's release semantics. - Snapshot semantics for readers. A reader sees a single
consistent value; subsequent reads may observe a different swapped
value. Callers that need a stable snapshot across multiple reads
should hoist the
load()to a local. (No subsystem in the workspace currently relies on inter-read consistency forarc-swapsurfaces.) - Cognitive load for new contributors unfamiliar with the
semantics:
load()returns a snapshot, not a live reference. The module-level comments on eachArcSwapfield call this out.
Revisit if
- A surface using
arc-swapgrows a need for partial mutation that the swap pattern can't model cleanly — switch totokio::sync::RwLockat that surface only. - The
arc-swapcrate's maintenance status changes materially (it's small and stable, but watch for unmaintained markers). - The workspace adds a surface with writer fairness requirements; do
not stretch
arc-swapto cover it.
References
arc-swapcrate: https://crates.io/crates/arc-swap- Surfaces:
crates/caliban-agent-core/src/permission_mode.rs:124-140crates/caliban-model-router/src/breaker.rs:68-79crates/caliban-settings/src/lib.rs:70-83
ADR 0044 · rmcp 1.7 version pin
- Status: accepted
- Date: 2026-05-26
Context
caliban-mcp-client depends on rmcp — the Model Context Protocol
Rust SDK. The workspace Cargo.toml pins it at 1.7.x:
rmcp = { version = "1.7", features = [...] }
This is a tighter pin than the typical Rust convention of "compatible
with the listed version" (^1.7 allows any 1.x.y where x ≥ 7). The
choice was made when adopting rmcp and never recorded as an ADR
until now.
Decision
Pin rmcp at the 1.7.x minor.
Bumps to a new minor (1.8, 1.9, etc.) are landed in a single dedicated PR after:
- Reading the upstream changelog for breaking changes affecting our MCP transport, OAuth, elicitation, or resource surface (ADRs 0017, 0023).
- Verifying our integration tests still pass against the bumped version.
- Spot-checking the canonical reference MCP servers (a stdio server, an HTTP+OAuth server) end-to-end.
Patch bumps within 1.7.x (1.7.0 → 1.7.1) are auto-resolved by Cargo
and do not require a dedicated PR.
Consequences
- Insulation from breaking changes in MCP transport or server
APIs between rmcp minor releases. Our surface
(
crates/caliban-mcp-client/src/{client,transport,oauth,elicitation,resource}.rs) is large enough that an unexpected upstream minor could mean a multi-day debug session. - Manual maintenance cost. Each minor bump requires changelog review + integration test pass + a dedicated PR. Estimate: 1-3 hours per bump.
- Predictable runtime behavior for users running pinned binaries against established MCP servers. The wire protocol is stable across the 1.x line by upstream convention, but rmcp's API surface has reshaped between minors in the past.
- Risk: lagging behind upstream means missing protocol-level enhancements (e.g., new transport modalities, new elicitation features) until we explicitly bump. Mitigation: a quarterly changelog check is on the project cadence.
- Risk: security updates in a future minor (e.g., a fix in OAuth validation) require an immediate bump rather than auto-pulling. Mitigation: subscribe to the rmcp release notes / RustSec advisories.
Revisit if
- rmcp reaches 2.0 — at which point the pin needs to move regardless, and the changelog review is mandatory.
- A security advisory affecting our usage of rmcp surfaces — bump immediately to the patched minor, write the dedicated PR retrospectively.
- The maintenance cost of staying current outweighs the insulation benefit (e.g., if upstream stabilizes such that minors stop reshaping the API).
References
rmcpcrate: https://crates.io/crates/rmcp- ADR 0017 (MCP stdio v1) and ADR 0023 (MCP v2 — transports, OAuth, elicitation, resources) — the surfaces that consume rmcp.
- Workspace pin: root
Cargo.toml(rmcp = { version = "1.7", ... }).
ADR 0045 · Permissions v2 — TOML-primary config + richer rule schema
- Status: accepted
- Date: 2026-05-31
- Supersedes (partial): ADR 0026 (settings layering) — refines write format and per-rule schema.
Context
caliban shipped v1 permissions (ADR 0020), permission modes
(ADR 0029), and layered settings (ADR 0026) with JSON as the
canonical write format. Operator feedback and a security/UX review
surfaced four classes of problems: (1) the TUI Ask modal's "always
allow / always deny" never persisted, breaking the ADR 0020 promise;
(2) the JSON permissions.{allow,ask,deny} form lost source order
and comments; (3) JSON is the wrong primary format for a Rust
project where operators expect TOML and want hand-edited config that
ports between machines; (4) there was no full management surface
(CLI or in-TUI editor) for rules.
Decision
- Restore TOML as caliban's canonical config write format at
every scope; JSON is accepted on read as a legacy/import path
(with a WARN). All caliban-owned writes — modal,
/permissionseditor,caliban permsCLI — emit TOML. - Replace the three-bucket
permissions.{allow,ask,deny}form with an ordered[[permissions.rules]]array of objects carryingpattern,action, optionalcomment, optionalreason(deny-only, seen by the model), and reservedexpires_at. First match wins. The three-bucket form still loads (legacy compat) but normalizes into the ordered array on load. - Extend pattern grammar: globstar
**, path normalization for file-edit tools,Bash:~globanywhere-match, dotted-key MCP arg accessors. - Modal writeback (P1): y / n opens a sub-prompt with narrow-default suggestions, a scope picker, and an optional comment/reason. Atomic flock-protected TOML append.
- Active management surface:
/permissionsoverlay grows full editor capabilities;caliban permsCLI provides headlesslist / test / explain / add / remove / import / export / audit / lint. - Hardening:
permissions.enforcelockdown knob, append-only JSONL decision log under$XDG_STATE_HOMEwith size-based rotation, always-visible bypass-latch chip withctrl+shift+bdrop keybind.
Consequences
- Positive: matches Rust ecosystem norms; comments and source-order survive; the modal's promise is finally honored; operators have a complete management story (TUI + CLI); enforce + audit log close long-standing security gaps.
- Negative: doubles the schema surface during the compat window (legacy JSON + TOML buckets + v2 ordered rules coexist on read); the matcher gets a denser grammar (more to document).
- Compat window: legacy reads continue for two minor releases; writes deprecate immediately. After three minor releases only the canonical TOML schema loads.
Runtime application semantics
Rules added through the Ask modal's "Always allow/reject" are applied to
the running session immediately: the gate and the TUI share one
RuntimeRuleStore, so the just-added rule gates the next matching tool
call without re-prompting — regardless of which scope it is also persisted
to on disk.
Rule removals and out-of-band file edits are intentionally not
hot-reloaded into a running session. Deleting a file-scoped rule via the
/permissions overlay or caliban perms remove updates the on-disk file
but does not retroactively tighten the live gate; the change takes effect
on the next session start. Deleting a session (runtime) rule with [d]
in the overlay does take effect live, because it mutates the in-memory
store directly. This asymmetry keeps the gate cheap — no per-call disk
re-read or file watcher — while making the common "allow this now" gesture
feel instant.
Revisit if
- Operators report concrete cases where the
~globor dotted-key grammars are insufficient — next step would be a richer expression language or a classifier-graded gate (already deferred via ADR 0029 auto-mode). - The bypass-latch chip + drop keybind UX proves footgunny — could promote the drop to a confirmation dialog.
ADR 0046 · Two-stage tool surface — lazy MCP schema loading + ToolSearch
- Status: accepted
- Date: 2026-05-31
- Spec:
docs/superpowers/specs/2026-05-31-two-stage-tool-surface-design.md - Related: ADR 0017 (MCP client architecture), ADR 0021 (Sub-agent
primitive), ADR 0023 (MCP v2), ADR 0026 (Settings layering),
ADR 0037 (Sub-agent isolation + fleet), ADR 0043 (
arc-swapshared state).
Context
ToolRegistry::to_caliban_tools() is invoked once per turn at
crates/caliban-agent-core/src/stream/mod.rs:497-523, cloning every
registered tool's name + description + JSON Schema into the wire
payload. Built-ins are bounded (~14 entries) but MCP tools scale
linearly with configured servers — three average MCP servers can add
~20K tokens/turn of dormant tool advertising before history is
considered. The problem is structural and will worsen as the
MCP/plugin ecosystem grows, which calls for a design doc + ADR +
multi-PR sequence; this ADR is that decision.
Decision
-
Introduce a single new built-in
ToolSearchthat returns matched MCP tools with their full JSON Schemas and activates them for the rest of the session in a single round-trip. No separateActivatetool; no two-step UX. -
Store activation state in a sidecar
McpActivationSetheld byAgentasArc<ArcSwap<McpActivationSet>>, following the read-mostly pattern of ADR 0043.ToolRegistryis unchanged; an addedto_caliban_tools_filtered(&WireFilter)returns the per-turn wire subset. -
Filter MCP tools, never built-ins. The v1 scope is MCP-only laziness; built-ins (
Read,Grep,Glob,Edit,Bash,Write,WebFetch,WebSearch,TodoWrite,Skill,AgentTool,EnterPlanMode/ExitPlanMode, memory tools) stay always-present. Plugin-tool laziness is moot today (plugins contribute skill roots, not tools). -
Sticky per session, LRU evict at cap. Activations persist for the rest of the session;
tools.max_active_schemas(default 24) is a soft cap. New activations beyond the cap evict the least recently used entry, reported in theToolSearchresponse text so the model sees what dropped. -
Sub-agent inheritance is opt-out via frontmatter.
AgentToolfrontmatter gainsinherit_active_mcp: Option<bool>defaulting totrue. When true,install_sub_agentsnapshots the parent'sMcpActivationSet; when false the child starts fresh. The existingtools: [...]allowlist still filters. -
Default off; opt-in via
tools.lazy_mcp = true. Conservative v1; flip to default-on in v1.1 after validation. Per-server override viamcp.toml([server.X] lazy = false) pins always-hot servers (e.g. a memory/notes server) to eager mode. -
Belt-and-suspenders discovery. When
lazy_mcp = trueand at least one MCP tool is gated, splice a fixed paragraph into the system prompt explainingToolSearchplus the deferred count; the ToolSearch tool description itself also names the affordance. -
/contextsurfaces the active set asMCP active: N/cap (a, b, c)./usageis intentionally not touched in v1 (no honest counterfactual reporting yet).
Consequences
- Positive: removes a linear-in-MCP-cardinality token tax from every turn; matches the function-calling pattern many models are trained on; structural readiness for plugin-tool laziness later; no protocol change for the eager path (default behavior is byte-identical).
- Positive: single read-mostly
ArcSwapfor activation state fits the existing concurrency model and makes sub-agent snapshot trivial. - Negative: introduces a model-facing contract (search-then-call) that requires the model to read system-prompt guidance; some weaker models may not pick up the pattern reliably (mitigation: it is opt-in in v1, and the "model issues tool_use without searching first" path still works via registry dispatch + auto-activation).
- Negative: tool-list cache prefix is invalidated on each activation; a future split-cache optimisation is sketched in the spec but out of scope for v1.
- Compat window: default
falsefor v1; v1.1 flips default totrue(parity matrix rows F.ToolSearch / F.WaitForMcpServers move 🔴 → 🟡 in v1, 🟡 → ✅ in v1.1).
Revisit if
- Activation set's read-mostly assumption breaks down (e.g. the
model starts calling
ToolSearchevery turn) — would warrant a finer-grained cache strategy. - Built-in tool palette grows substantially (e.g. a wave of new builtins) and the cardinality problem returns for built-ins — would motivate a separate built-in laziness spec.
- A model is observed to reliably ignore the deferred-block guidance — would motivate a stronger affordance (e.g. forcing an inert ToolSearch tool_use as the first turn under lazy mode).
- Activation persistence across session restart becomes a hot request — would warrant the v1.1 follow-up sketched in the spec's "Open questions" section.
ADR 0047 · Interactive background sub-agents (idle / await-input)
- Status: accepted
- Date: 2026-06-10
- Spec:
docs/superpowers/specs/2026-06-10-interactive-background-subagents-design.md - Amends: ADR 0037 (sub-agent worktree isolation + background fleet) — revises one non-goal clause; see "Decision".
- Builds on: ADR 0009 (agent-core stream-as-primitive), ADR 0024 (hook taxonomy), ADR 0037 (background fleet + per-agent socket).
- Author: john.ford2002@gmail.com
- Issue: caliban-ai/caliban#81
Context
ADR 0037 shipped the background fleet: bg = true sub-agents owned by the
caliband daemon, each exposing a per-agent socket carrying its TurnEvent
stream. Issues #71 / #78 / #79 / #75 / #76 / #77 implemented that runtime —
workers launch, stream their transcript live over caliban agents attach,
clean up on exit, and run behind a permission gate.
ADR 0037 deliberately scoped inbound interaction out. Its non-goals say:
Re-attaching a stopped sub-agent into the parent's context. Once detached, a background sub-agent runs to completion (or is killed). The parent reads its final summary via
caliban agents attachor the/agentsoverlay.
and its design spec describes the per-agent socket as carrying
"TurnEvents and inbound user messages" — a capability that was
documented but never built. The result today: an attached operator can
watch a background agent but cannot talk to it. When the agent finishes
its prompt, it ends; there is no way to say "good, now also do X" without
respawn (which loses all context).
Two facts make this worth revisiting now:
- It is a small generalization, not a rewrite. The agent loop already
has
TurnDecision::ContinueWith(Vec<Message>)(ADR 0024 hook taxonomy): anafter_turnhook can inject messages and force another turn. That is exactly "resume a finished turn with new input" — capped atMAX_FORCED_CONTINUATIONS = 3only to stop hook death-spirals. - The fleet UX expects it.
AgentStatus::Idle("awaiting input; no compute pending") is defined in the proto and rendered byagents listbut is never set, because nothing awaits input.
This ADR records the architectural commitments for closing that gap. Mechanics live in the companion design spec.
Decision
Revise ADR 0037's "runs to completion" non-goal
ADR 0037's non-goal is narrowed, not deleted:
- Still a non-goal: re-attaching a sub-agent into the parent agent's
automated context. A
bgsub-agent never feeds results back into the parent's running loop; the parent reads a final summary out-of-band. This ADR does not change that. - Now permitted: an operator (a human at
caliban agents attach) may send user messages to a running background sub-agent, which resumes from that input rather than ending. This is interactive operator I/O over the per-agent socket — categorically different from automated parent-context re-attachment.
The distinction matters: the danger ADR 0037 guarded against was automated fan-in (a sub-agent silently resuming the parent). A human typing into an attached session carries no such hazard and is the natural way to steer a long-running background task.
Interactivity is a first-class agent-core run mode, not a hook hack
We add an optional InputProvider to a run (via RunSettings), rather
than overloading after_turn + ContinueWith. When the model reaches a
natural end-of-run boundary (it stopped and no tool call is pending), the
loop — if an InputProvider is configured — awaits the provider for the
next user message:
Some(messages)→ inject into history, mark Idle → Running, take another turn.None→ the provider signalled end-of-input; the run ends normally (StopCondition::EndOfTurn, statusDone).
We choose a pull-based InputProvider over the existing
ContinueWith hook path because:
- It is not death-spiral-prone, so it is correctly uncapped (a human
drives it;
MAX_FORCED_CONTINUATIONSstays as the anti-spiral cap for hook-forced continuations only). - It models "await input" honestly — the loop blocks on external I/O at
a well-defined boundary, which
after_turn(fires every turn) does not. - It composes with hooks —
before_turn/after_turn/permission hooks still run on the resumed turns unchanged.
Foreground and one-shot runs pass no InputProvider and are byte-for-byte
unchanged (the boundary check is if let Some(provider)).
The per-agent socket becomes bidirectional
ADR 0037's per-agent socket carried worker→client TurnEvents only (#79).
It becomes bidirectional: the worker continues writing TurnEvent NDJSON
outbound, and now reads inbound user-message frames (newline-delimited
JSON, a small tagged frame type) from attached clients. The worker's
InputProvider is fed by these frames. caliban agents attach gains a send
path (stdin → user-message frames). Read-only viewers (e.g. a future
/agents overlay tail) simply never send.
Idle is a real, reported lifecycle state
AgentStatus::Idle is wired: the worker reports Running → Idle when it
begins awaiting input and Idle → Running when it resumes. Because the
daemon — not the worker — owns the registry, this requires a worker →
daemon status-report channel (the worker currently only talks to attach
clients). The design spec picks the mechanism; the commitment here is that
Idle is observable in agents list and the /agents overlay.
Bounded idle: an idle agent must not live forever
An agent awaiting input with no attached clients is a resource leak in
waiting. The run ends (Done) when any of:
- the
InputProviderreturnsNone(an attached operator sent an explicit end / detached with end-intent), - a configurable idle timeout elapses with no inbound message and no attached client, or
caliban agents kill(unchanged).
Default idle timeout is conservative (minutes, configurable per
SupervisorConfig); the spec sets the exact default.
Consequences
- Positive. Closes the last documented gap in the ADR 0037 per-agent
socket ("inbound user messages"). Turns background sub-agents from
fire-and-forget into steerable long-running workers — the natural UX for
"kick off a background refactor, watch it, nudge it." Reuses the audited
permission gate (#75) on resumed turns. Wires the long-dormant
Idlestate. TheInputProviderabstraction is reusable beyond background agents (e.g. a future scripted multi-turn driver). - Negative. A new first-class run mode in agent-core (small, but it touches the core loop's end-of-run boundary — the highest-blast-radius file in the codebase). A worker→daemon status channel that did not exist. A bidirectional socket protocol (frame schema, multi-client inbound multiplexing). The idle-timeout adds a timer to the worker. None of these affect foreground/headless runs.
- Revisit if: multi-client inbound proves confusing (two operators
typing at one agent) — may need a single-writer lease. If
InputProviderwants richer turns than user text (images, tool results), generalize the inbound frame. If operators want to fork an idle agent's context rather than continue it, that is a separate "branch" primitive, out of scope.
Decomposition (see spec for detail)
This ADR is intentionally larger than one PR. The spec breaks it into independently-shippable tickets:
- agent-core
InputProviderrun mode (+ tests; foreground unaffected). - Bidirectional per-agent socket frame protocol + worker
InputProviderbacked by the socket. caliban agents attachsend path (stdin → frames; end/detach semantics).- Worker → daemon status reporting +
AgentStatus::Idlewiring. - Idle timeout + bounded-lifetime cleanup.