Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Caliban User Guide

Caliban is a Rust-native, provider-agnostic AI agent harness that puts you in control of model routing, memory, permissions, and prompt context. This guide is for users who run caliban day-to-day and operators who deploy and configure it for a team or homelab; it describes behavior and workflows, not Rust internals.

How this guide is organized

PartWhat it covers
IntroductionWhat Caliban is, why it exists, and current project status
Getting StartedInstallation & building, your first session, the interactive TUI, and headless basics
Providers & ModelsSupported providers, API key setup, model selection, and the model router
ConfigurationSettings layering across four scopes, file locations, and the full settings reference
PermissionsCore concepts, the pattern grammar, permission modes, and rule management
ReferenceCLI flags, settings schema, slash command index, environment variables, and file paths

Project status

Caliban v0.1.0 is a pre-release. The core feature set is daily-usable on main under AGPL-3.0. See Project Status for what is shipped versus planned.

What Is Caliban?

Caliban is an AI agent harness: a CLI that drives one or more language models through a structured loop of prompts, tool calls, and responses while managing sessions, permissions, memory, and extensibility around that loop. It is provider-agnostic — the same harness works with Anthropic Claude (direct, Bedrock, Vertex), OpenAI (direct, Azure), Google Gemini (AI Studio, Vertex), and local Ollama, all through a common internal representation.

Capabilities at a glance

CapabilityWhat it gives youWhere to learn more
Interactive TUIFull-screen terminal UI with transcript, status bar, slash-menu, file picker, and permission modalsThe Interactive TUI
Headless / print modeOne-shot -p flag for scripting; stream-json protocol for machine-readable outputHeadless Basics
Persistent sessionsNamed sessions saved to disk; resume across invocations with --resume or --continueSessions & Persistence
PermissionsRule-based gate on every tool call; six modes from default to bypassPermissions; audit logPermissions Concepts
Built-in toolsRead, Write, Edit, MultiEdit, Glob, Grep, Bash, BashBg, WebFetch, WebSearch, NotebookEdit, TodoWrite, and moreBuilt-in Tools
MCP clientConnect external tool servers over stdio or HTTP; OAuth; per-server permission scopingMCP Servers
Sub-agentsIn-process agent calls, background agents via caliband, git-worktree isolationSub-agents
Memory tiersGlobal, project, and auto-memory via CLAUDE.md ancestry and @-importsMemory Tiers
Model routerDeclarative routes per purpose (MainLoop, Compaction, …); fallback chains; circuit breakersThe Model Router
Plugins, hooks & skillsBundle capabilities as plugins; hook lifecycle events; load skill files for slash commandsExtending Caliban

Provider-agnostic by design

Because Caliban normalizes all providers to a single internal IR, you can switch models or providers with a single flag (--provider, --model) or a caliban.toml router config, without changing your workflow.

The agent loop

At its core, Caliban runs a streaming agent loop:

flowchart LR
    U[User prompt] --> A[Agent loop]
    A --> M[Model — streams response]
    M -->|tool_use blocks| T[Tool dispatch]
    T -->|tool results| M
    M -->|stop| O[Response shown to user]

Each turn streams from the model as it arrives; tool calls are dispatched as they appear and their results fed back until the model produces a final text response. The loop runs identically in TUI, headless, and library contexts.

Philosophy

Caliban exists because the dominant AI agent CLIs are tightly coupled to a single provider and leave operators with little control over what the model sees, what it can do, or where state is stored. The design is a direct response to those constraints.

Operator control

You decide what model handles each task, what context goes into the prompt, and which tools the model is allowed to call. Routing is declarative (caliban.toml); settings layer at four scopes (managed, user, project, local) with deep-merge semantics; permissions are first-class and auditable. Nothing is hardwired to a service the operator does not control.

Provider-agnostic

No SDK lock-in. Anthropic Claude, OpenAI, Google Gemini, and local Ollama all speak the same internal representation inside Caliban. Cloud transports (AWS Bedrock, Google Vertex, Azure OpenAI) are cargo-feature-gated and additive — the core binary has no mandatory cloud dependency. Switching providers is a flag, not a rewrite.

Local-first and data sovereignty

Sessions, checkpoints, auto-memory, and tool-result overflows live on your disk by default. Caliban is designed to run in a self-hosted homelab: no required cloud account, no telemetry unless you opt in (CALIBAN_ENABLE_TELEMETRY=1), no state sent anywhere you do not control.

AGPL-3.0 transparency

Caliban is licensed under AGPL-3.0-only. If you modify Caliban and run it as a network service or distribute the binary, you must release your changes under the same license. This closes the "SaaS loophole" that GPL-3.0 leaves open, aligning with projects like Mastodon and Nextcloud that use AGPL to keep improvements in the commons. Personal use is unaffected. The full rationale is in ADR 0003.

Rust performance

Harness overhead should be negligible compared to model latency. The time-to-result you experience is dominated by the model, not the runtime. This is not a feature worth advertising loudly — it is a baseline expectation for a tool that runs constantly in the background.

What Caliban does not try to be

Caliban is a terminal agent harness, not an IDE extension, a cloud service, or a mobile app. IDE integration, GitHub App, and remote-control surfaces are tracked in the parity matrix (theme N) but are explicitly parked until the terminal/CLI feature set reaches parity with Claude Code. The guide does not document planned features as if they were shipped.

Project Status

Caliban v0.1.0 is a pre-release. The binary (caliban) is daily-usable from main; the core agent loop, TUI, headless mode, sessions, permissions, tools, MCP, sub-agents, memory, sandbox, and telemetry are all shipped. A number of parity gaps with Claude Code remain.

What is shipped

The table below summarizes the major shipped areas. All items marked ✅ are available on main today.

AreaStatus
Interactive TUI (ratatui, transcript, status bar, slash menu, @file picker)
Headless --print / stream-json I/O protocol
Persistent named sessions (--session, --resume, --continue)
Permissions: rule grammar, six modes, caliban perms CLI, audit log
Built-in tools (Read, Write, Edit, MultiEdit, Glob, Grep, Bash, BashBg, WebFetch, WebSearch, NotebookEdit, TodoWrite, AgentTool, Memory, Plan)
MCP client (stdio + HTTP, OAuth, elicitation, per-server permissions)
Sub-agents (in-process, background fleet via caliband, worktree isolation)
Memory tiers: CLAUDE.md ancestry, @-imports, auto-memory
Settings layering (Managed > User > Project > Local, deep-merge, live reload)
Model router v2 (declarative routes, fallback chains, circuit breakers, capability filters)
Providers: Anthropic, OpenAI, Google Gemini, Ollama, Bedrock, Vertex
Checkpoints + /rewind
Plugins, hooks, skills
OS sandbox (Seatbelt on macOS, bubblewrap on Linux)
OpenTelemetry + per-request cost tracking

What is partial or backlog

Some rows in the parity matrix are 🟡 (partial / experimental):

AreaState
Slash-menu typeahead🟡 partial
Multi-line input (Shift+Enter native)🟡 partial
Vim editing mode in TUI🔴 not yet
Cost surfacing in TUI (/cost display)🟡 backlog
GitHub Actions workflow / devcontainer feature🔴 planned
IDE extensions, GitHub App, remote control, mobile (theme N)🔴 parked until CLI parity

Theme N surfaces are parked

IDE extensions, the GitHub App, claude.ai/code, iOS, Slack integration, Remote Control, Channels, Routines, Deep links, and Teleport are all tracked in the parity matrix under theme N. They are explicitly parked until the terminal/CLI feature set reaches full parity with Claude Code. Do not rely on any of these surfaces being available in the near term.

For the full up-to-date breakdown, see Parity vs Claude Code. If you hit something unexpected, see Troubleshooting.

License

Caliban is licensed under AGPL-3.0-only. See Philosophy for the rationale.

Installation & Building

Caliban is distributed as source. You build it with Cargo and install the resulting binary yourself. There are no pre-built releases yet.

Requirements

RequirementDetails
Rust toolchain1.95.0, pinned in rust-toolchain.toml
rustupInstalls the pinned toolchain automatically on first cargo invocation
GitTo clone the repository

rustup detects rust-toolchain.toml and downloads the exact channel automatically — no manual rustup install step required.

Clone

git clone https://github.com/caliban-ai/caliban.git
cd caliban

Build

Release binary

cargo build --release --bin caliban

The binary lands at target/release/caliban. Build time on a modern machine is a few minutes on a cold cache.

Development build

cargo build --workspace      # all crates, debug symbols
cargo test  --workspace      # full test suite

Put the binary on your PATH

# Option A — copy to a directory already on your PATH
cp target/release/caliban ~/.local/bin/caliban

# Option B — add target/release to PATH (in your shell profile)
export PATH="$PWD/target/release:$PATH"

Smoke test

caliban --version

You should see a version string. If you get a "command not found" error, confirm target/release/ is on your PATH.

Optional: cloud transport feature flags

By default, caliban connects to providers over their public HTTPS APIs. Cloud-managed transports (AWS Bedrock, Google Vertex AI, Azure OpenAI) require optional Cargo feature flags. The exact flag names per crate are:

TransportFeature flag
Anthropic via AWS Bedrockcaliban-provider-anthropic/bedrock
Anthropic via Google Vertex AIcaliban-provider-anthropic/vertex
OpenAI via Azurecaliban-provider-openai/azure
Gemini via Google Vertex AIcaliban-provider-google/vertex

To build a binary with multiple cloud transports enabled at once:

cargo build --release --bin caliban \
  --features caliban-provider-anthropic/bedrock,caliban-provider-anthropic/vertex,\
caliban-provider-openai/azure,caliban-provider-google/vertex

Cloud transport features are not built in default CI runs. They are exercised by a weekly cron job and by manual dispatch of the ci-cloud workflow.

Helper scripts

The scripts/ directory contains these helpers:

ScriptPurpose
scripts/check.shMirrors the full PR CI suite locally: cargo fmt --check, cargo clippy, cargo build, cargo test. Accepts --cloud to additionally run the cloud-features build, and --no-test to skip the test step.
scripts/coverage.shMeasures workspace line coverage with cargo-llvm-cov and fails below the COVERAGE_MIN floor — the same gate CI enforces. Accepts --html/--open to render an HTML report and --no-fail to report without gating. Writes lcov.info + coverage.json under target/llvm-cov/.
scripts/coverage-report.pyRenders target/llvm-cov/coverage.json into the Markdown coverage report CI posts as a sticky PR comment (overall stats, per-crate breakdown, notable gaps). Run after coverage.sh to preview it locally.

Run scripts/check.sh --help or scripts/coverage.sh --help for the full usage summary.

Headless / CI builds

On headless Linux hosts, the default binary features include clipboard (the arboard crate). If your CI image lacks the X11/Wayland clipboard libraries, build with --no-default-features to avoid the link-time dependency.

Your First Session

Get caliban answering questions in under five minutes.

Set an API key

Caliban needs credentials for at least one provider before it can call a model. The quickest path is an environment variable. For Anthropic (the default provider):

export ANTHROPIC_API_KEY=sk-ant-...

For other providers, see Configuring Providers & API Keys.

Run a one-shot prompt

The -p / --print flag runs caliban non-interactively: it sends your prompt, streams the response to stdout, then exits.

caliban -p "What is the capital of France?"

That's it. The assistant's reply prints to stdout.

Default provider and model

When no --provider or --model flag is given, caliban defaults to Anthropic with model claude-sonnet-4-6. You can override either flag on the command line:

caliban --provider openai --model gpt-5.5 -p "Hello"

Work in a directory

Caliban uses the current working directory as the workspace root for file and shell tools. Just run it from your project:

cd ~/dev/my-project
caliban -p "Summarise README.md"

Enter interactive mode

Drop the -p flag (and any prompt) to enter the interactive TUI instead:

caliban

Caliban detects that stdin is a TTY and launches the ratatui interface. Type your message and press Enter. To quit, press Ctrl-C or Ctrl-D at an empty prompt.

For a tour of the TUI, see The Interactive TUI.

Named sessions

Every conversation can be saved to a named session and resumed later:

# First run — creates a session called "research"
caliban --session research "Read README.md and summarise it"

# Later — resume the same conversation
caliban --resume research

Sessions are stored on disk under the platform's data directory (for example ~/.local/share/caliban/sessions/ on Linux). See Sessions & Persistence for details.

The Interactive TUI

Invoking caliban with no prompt on a TTY launches the ratatui-based terminal interface. This is the primary mode for open-ended, conversational work.

Launching

caliban

Caliban detects that stdin is a TTY and enters the TUI. If you prefer to start from a specific session, pass --resume <name> or --continue (resumes the most recently updated session).

Basic flow

The screen is divided into three areas:

┌──────────────────────────────────────────────┐
│ assistant: Ready. What would you like to do? │
│                                              │
│ 🔧 Read({"path":"src/main.rs"})              │
│    → Read src/main.rs, lines 1-42 of 42      │
│                                              │
│ assistant: The entry point is…               │
├──────────────────────────────────────────────┤
│ > █                                          │
├──────────────────────────────────────────────┤
│ ~/dev/my-project · anthropic claude-sonnet-4-6 · session: work │
└──────────────────────────────────────────────┘
AreaPurpose
Transcript pane (top)Conversation history, tool calls, and tool results
Input bar (middle)Type your message here
Status line (bottom)Working directory, active provider/model, session name

Type your message and press Enter to send. For multi-line composition, use Shift+Enter on terminals that support the kitty keyboard protocol (kitty, iTerm2, Ghostty, WezTerm, foot) or Alt+Enter as a portable fallback.

Press Ctrl-C during a turn to cancel it. Press Ctrl-C or Ctrl-D at an empty prompt to exit.

Tool calls and the permission modal

When the model wants to invoke a tool (read a file, run a shell command, etc.) caliban checks its permission rules before executing. Depending on the matching rule, the call is:

  • allowed automatically — executes silently; a status line appears in the transcript.
  • denied automatically — the model is told the call was refused.
  • asked — a modal dialog appears:
  ┌─ Permission required ────────────────────────────────┐
  │  Bash: git commit -am "fix typo"                     │
  │                                                      │
  │  [y] Allow once   [Y] Always allow                   │
  │  [n] Deny once    [N] Always deny                    │
  └──────────────────────────────────────────────────────┘

Pressing y or n handles the call once. Pressing Y or N opens a sub-prompt that lets you write a permanent allow or deny rule to a config scope, so you are not asked again for the same pattern.

Cycling permission modes

Shift+Tab cycles the session-wide permission mode through the available values. The current mode is shown as a chip in the status line (the default mode hides the chip). Modes in order:

ModeWhat happens to Ask-class calls
defaultThe modal appears
acceptEditsWrite/Edit/MultiEdit/NotebookEdit are auto-allowed; Bash still asks
planAll tool execution is paused; the model can only plan
autoAn auto-classifier decides; uncertain calls fall back to Ask
dontAskAll Ask-class calls are allowed without prompting

bypassPermissions (rules ignored entirely) is only reachable when the session was started with --allow-dangerously-skip-permissions.

For a full explanation of each mode, see Permission Modes.

The slash menu

Typing / at the input bar opens a fuzzy-search menu of slash commands:

> /
  /clear      Clear the transcript
  /compact    Summarise and compress context
  /model      Switch the active model
  /rewind     Restore a checkpoint
  …

Continue typing to filter the list; press Enter to run the selected command. See Slash Commands for the full index.

File attachments

Type @ followed by a path prefix to open a live file picker. Selecting a file inlines its contents into the outgoing message — the model sees the file without a separate Read tool round-trip.

For a deeper look at transcript navigation, keyboard shortcuts, and the @-attachment picker, see The TUI in Depth.

Headless Basics

Headless mode runs caliban non-interactively: prompt in, output to stdout, exit. It is the right entry point for scripts, CI pipelines, and any context where there is no TTY.

The -p / --print flag

Pass -p (or the long form --print) with your prompt to run headlessly:

caliban -p "List the files in this directory"

Without -p, caliban checks whether stdin is a TTY. If stdin is not a TTY (i.e. you are in a pipe) or stdout is piped, caliban enters headless mode automatically. Pass --no-auto-print to suppress the automatic fallback if you need to control this explicitly.

Output formats

The --output-format flag selects what caliban writes to stdout:

FormatWhat you get
text (default)The assistant's final reply as plain text
jsonA single JSON object — the final result frame — suitable for jq
stream-jsonNDJSON: one event frame per line as the run progresses
# Plain text
caliban -p "Explain tokio" --output-format text

# Single JSON result
caliban -p "Explain tokio" --output-format json | jq '.result'

# NDJSON stream (tool calls, partial messages, final result)
caliban -p "Explain tokio" --output-format stream-json

The stream-json format is the richest: it includes a system/init frame first (active model, tools, MCP servers, settings sources), per-call tool_use and tool_result frames during the run, and a final type: result frame with token counts and cost. For full details see The stream-json Protocol.

Reading the prompt from stdin

Pass - as the prompt to read from stdin instead of the command line:

echo "What does this error mean?" | caliban -p -
cat error.log | caliban -p -

This is useful when the prompt is too long for a shell argument or is generated by another command.

Exit codes

Caliban follows sysexits.h conventions plus two additional signals:

CodeMeaning
0Success
1Generic runtime error
2Tool or assistant error
64Bad flags (EX_USAGE) or malformed stream-json input
66Missing input (EX_NOINPUT) — e.g. --resume names a non-existent session
75--max-turns exceeded (EX_TEMPFAIL)
78Configuration error — stdin over 10 MB, settings parse failure
124Cancelled — SIGTERM or Ctrl-C from the agent loop
130SIGINT reached the harness (second Ctrl-C)
137--max-budget-usd exceeded

CI scripts can distinguish budget exhaustion (137) from a real failure (1/2) without parsing stdout.

When to use headless

Choosing between headless and interactive

Use -p when you know the task up front and want a single answer: one-shot summaries, code review scripts, CI checks, shell pipelines. Use the interactive TUI when you want a back-and-forth conversation, need to inspect tool calls as they run, or want to adjust the permission mode mid-session.

Permissions in headless mode

There is no modal in headless mode. Any rule that would normally show the Ask dialog instead becomes a hard deny. Read-only tools (Read, Glob, Grep) are allowed by default, but write and shell tools are not. To grant write access, pick one:

# Auto-allow file edits
caliban -p "Fix the typo in README.md" --permission-mode acceptEdits

# Narrow allow rule (repeatable)
caliban -p "Run tests" --allow 'Bash:cargo test*'

# Allow everything that would normally Ask (use sparingly)
caliban -p "..." --auto-allow

For depth on permission modes and rule syntax, see Print Mode.

Sessions & Persistence

Every conversation caliban has with a model is a session: a named, timestamped record of messages, token usage, and active todos. Sessions persist automatically so you can stop at any point and pick up exactly where you left off.

Starting a named session

caliban --session my-project

If my-project already exists on disk, caliban resumes it. If not, a new empty session is created. Session names must match [a-zA-Z0-9_-]+ and be between 1 and 64 characters.

Resuming a previous session

Three flags handle resume:

FlagMeaning
--session NAMELoad or create the session named NAME.
-c / --continueResume the most recently updated session.
-r NAME / --resume NAMEResume a named session (alias for --session with load semantics).

-c is the fastest way back into your last conversation:

caliban -c

-r accepts the same name grammar as --session:

caliban -r my-project

Resume semantics

When caliban opens an existing session it restores the full message history and accumulated token usage. The model and provider recorded in the session file are used unless overridden by --model or --provider on the command line. Plan-mode state and the todo list are also restored.

Last-write-wins

Two caliban processes writing to the same session file concurrently will race. Caliban does not lock session files — run one interactive instance per session name at a time.

Suppressing persistence

To run a session entirely in memory without writing to disk, pass --no-save:

caliban --no-save

The session still functions normally for the duration of the run; nothing is written when it ends.

Overriding the sessions directory

By default, sessions are stored under your platform's data directory (see Files & Directories for the per-OS table). You can point caliban at a different directory for the duration of a run:

caliban --sessions-dir /path/to/sessions --session my-project

CALIBAN_SESSIONS_DIR is not a recognized env var for this flag — use --sessions-dir directly.

Session file format

Each session is a pretty-printed JSON file at <sessions-dir>/<NAME>.json. Fields include name, provider, model, messages, total_usage, created_at, updated_at, todos, and plan_mode. Files are written atomically (via a debounced background writer with a 250 ms window) to prevent corruption from crashes mid-save.

You can inspect, diff, or even git-track session files directly — the format is intentionally human-readable.

Listing sessions from the TUI

Inside the TUI, /resume lists all known sessions sorted by last-modified date. An optional substring filter narrows the list:

/resume                  # show all sessions
/resume my-proj          # show sessions whose name contains "my-proj"

Each row shows the session name, turn count, total token usage, and last-modified time. To open a listed session, exit and re-launch with caliban --session <NAME>.

Quick pick

caliban -c is the fastest path back to recent work — no name needed.

The TUI in Depth

Caliban's interactive mode is a full-screen terminal UI built on ratatui + crossterm. This chapter covers everything that goes beyond the basics introduced in The Interactive TUI.

Layout

The screen is divided into three regions from top to bottom:

┌─────────────────────────────────────────────────────┐
│                                                     │
│   Transcript / output region  (flex-grow)           │
│                                                     │
├─────────────────────────────────────────────────────┤
│   Input area (2 rows)                               │
├─────────────────────────────────────────────────────┤
│   Status bar (1 row)                                │
└─────────────────────────────────────────────────────┘

The input area sits between the transcript and the status bar, placing the prompt visually close to the context information below it.

Status line

The status bar shows cwd · provider model · session (turns) · running… during a live turn. When caliban is idle the spinner disappears and the elapsed-turn time is shown instead.

A custom prefix segment can be prepended by configuring a shell script in settings (see Settings Reference for the statusLine key). The script runs off-thread after each turn completes; its output is cached so it never blocks rendering. Use /statusline to inspect the active configuration.

Keybindings

KeyAction
EnterSubmit prompt
\ + EnterInsert a literal newline (multi-line input)
PageUp / PageDownScroll transcript
Ctrl+RReverse history search (session scope)
Ctrl+SCycle history scope → project → all projects
Ctrl+GOpen prompt in $VISUAL / $EDITOR / vi
Ctrl+OOpen transcript viewer overlay
Ctrl+BLaunch or follow a background bash process
Shift+TabCycle permission mode chip
EscClose overlay / cancel input
Esc EscOpen checkpoint rewind overlay (on empty input)

Overlays

Overlays are modal popups rendered centered (approximately 80% × 80%) over the main view. Press Esc or q to close any overlay. The active input bar is suppressed while an overlay is open.

Available overlays and how to reach them:

OverlayHow to open
Help/help
Configuration/config
MCP server status/mcp
Skills/skills
Permissions editor/permissions
Transcript viewerCtrl+O
Checkpoint rewind/rewind or Esc Esc (on empty input)
System prompt/system

Editor modes

Caliban's input bar uses emacs-style key bindings by default (Ctrl+A / Ctrl+E for line start/end, Ctrl+K to kill to end-of-line, etc.).

Vim mode is not yet available

Vim editing mode is listed as a gap in the parity matrix (status: 🔴 planned). The InputMode enum is designed to accommodate a vim layer, but it has not shipped. Emacs bindings are the only editor mode in the current release.

External editor handoff

Ctrl+G writes your current input buffer to a temp file, suspends the TUI (leaving the alternate screen), execs $VISUAL / $EDITOR / vi with the file as the argument, then reads the result back and re-enters the TUI. Multi-word editor values like EDITOR='code --wait' work because the value is split on whitespace without shell parsing.

Transcript viewer

Ctrl+O opens the transcript viewer overlay. It renders every ContentBlock in the conversation history — text, tool calls, tool results, thinking blocks, and images — as the model sees them.

KeyAction
[Dump the current viewport to scrollback (leave + re-enter alt-screen)
vOpen the full transcript in $VISUAL
q / EscClose the viewer
?Show key reference

Following background bash (Ctrl+B)

Background bash lets caliban run a shell command in the background while you continue interacting with the agent. Press Ctrl+B inside the TUI to open or follow the background bash output panel. The agent can launch background bash tasks via Bash{background:true}; the TUI surfaces their output through the same panel.

Ctrl+R opens inline reverse search over the current session's prompt history, showing matches as you type. Ctrl+S cycles the scope outward:

Ctrl+R  →  session scope
Ctrl+S  →  project scope  →  all-projects scope

Wider scopes are loaded lazily in a background task (budget: 2 s). History is persisted per project.

Configuring the TUI

All TUI-relevant settings — the status line script, output style, and context-window thresholds — live in the settings hierarchy. See Settings Reference and Output Styles for details.

Prompts, Attachments & Images

This chapter covers how to compose prompts, reference files, and send images to the model — whether you are working interactively in the TUI or driving caliban from the command line.

Writing prompts

In the TUI, type your prompt in the input area and press Enter to submit. For a multi-line prompt, press \ followed by Enter to insert a newline, then Enter alone on a blank line to submit.

For longer drafts, press Ctrl+G to open the current input buffer in $VISUAL / $EDITOR / vi. Caliban reads the saved file back when the editor exits.

In headless mode, pass the prompt via a positional argument, --prompt TEXT, or pipe from stdin using -:

caliban "Explain the diff"
caliban --prompt "Explain the diff"
git diff | caliban -p -

@path file references

Type @ in the TUI input bar to open the file suggestion menu (gitignore-aware). Continue typing to narrow by path. The selected file is read and attached to your prompt as a text block at submit time.

You can also type @path/to/file directly without the menu. Any @-reference that resolves to an image-like extension (.png, .jpg, .jpeg, .gif, .webp) is handled by the image pipeline rather than as text — see Images below.

Shell escape for quick commands

Leading ! at the start of the input bar runs the rest of the line as a shell command via the Bash tool (subject to permission rules). The result is not added to the conversation history.

Attachment size limits

Two flags control how large @-attachments can be:

FlagEnv varDefaultMeaning
--max-attach-bytesCALIBAN_MAX_ATTACH_BYTES262144 (256 KB)Maximum size of a single @-attachment
--attach-budget-bytesCALIBAN_ATTACH_BUDGET_BYTES1048576 (1 MB)Aggregate cap across all attachments in one message

If a single file exceeds --max-attach-bytes or the total across all files exceeds --attach-budget-bytes, caliban rejects the attachment with a clear error before sending anything to the model.

# Raise limits for a large codebase session
caliban \
  --max-attach-bytes 524288 \
  --attach-budget-bytes 4194304 \
  --session big-project

Images

Caliban supports image input via three entry points:

  1. @path — reference an image file by path in the TUI or via --prompt "@screenshot.png explain this" in headless mode.
  2. Clipboard paste — paste an image from the clipboard directly into the TUI input bar (platform clipboard integration required; built with the clipboard feature).
  3. Drag-and-drop — drag an image file into a supporting terminal emulator; caliban parses the DnD escape sequence and ingests the file.

Supported MIME types: image/png, image/jpeg, image/gif, image/webp.

Ingest pipeline

Before sending an image to a model, caliban runs it through an ingest pipeline:

  1. MIME sniff — infers type from magic bytes; rejects anything outside the allowlist.
  2. Decode + dimension check — decodes the image to verify it is not corrupt.
  3. Downscale — if the file exceeds 5 MiB (pre-base64) or the longest edge exceeds 1568 px, caliban downscales using Lanczos3 resampling. A [downscaled] badge appears in the TUI. The 1568 px target matches Anthropic's recommended longest edge for cost-efficient vision inputs.
  4. SHA-256 fingerprint — deduplicated images are not re-sent within a session.

The pipeline is configurable via [images] in caliban.toml:

[images]
max_bytes = 5242880          # 5 MiB pre-base64 cap
downscale_target = 1568      # longest-edge px target

Capability routing

By default, caliban will refuse to send an image to a model that does not have vision capability, surfacing a clear RouterError::NoCandidate rather than silently dropping the image. Set CALIBAN_STRICT_ROUTING=false to opt into degraded behavior where image content is replaced with a text placeholder and the request proceeds.

Session storage

Images are stored as blobs under <sessions-dir>/<session>/blobs/<sha256>.bin. Session JSON files carry only a BlobRef (the SHA-256), keeping transcripts small and git-diffable.

Graphics protocol detection

When the TUI renders an image inline, it detects the terminal's graphics protocol once at session start using the following cascade:

  1. CALIBAN_GRAPHICS env var — values: kitty, iterm, sixel, none.
  2. $TERM_PROGRAMiTerm.app and WezTerm → iTerm2 protocol.
  3. $TERM — contains kitty → Kitty protocol; contains sixel → DEC sixel.
  4. Fallback — text placeholder [image: WxH MIME filename].

Override detection explicitly when caliban picks the wrong protocol:

CALIBAN_GRAPHICS=kitty caliban --session vision-work

No vision route configured?

If you see a RouterError::NoCandidate error when pasting images, confirm that your active provider and model support vision. Check the active route with caliban router debug or /config in the TUI.

Slash Commands

Slash commands are operator-level shortcuts you type directly in the TUI input bar. They are not model-tool calls and are not gated by the permission rule grammar — they run as your direct action.

How the slash system works

Type / in the input bar to open the suggestion menu. A fuzzy typeahead list appears showing all registered commands grouped by category.

Typeahead is partially implemented

The slash-menu fuzzy typeahead is marked 🟡 (partial) in the parity matrix. Basic prefix matching works; full fuzzy ranking and category grouping are in progress.

Continue typing to narrow the list, then press Enter (or Tab) to select a command. Some commands run immediately (immediate: true) and return to the input bar; others open an overlay or emit output to the transcript.

Hooks fire on every slash submission: UserPromptSubmit carries is_slash: true, command, and args, so hooks can audit or veto any slash command.

Plugin-supplied commands

Plugins can register additional slash commands through the same SlashCommandRegistry. Built-in commands take priority; a plugin command with a conflicting name is dropped with a warning at registration time. See Plugins for details.

Common commands

The table below lists the most frequently used built-in commands. The full list — including commands added by plugins — is enumerated at runtime by /help inside the TUI.

CommandArgsWhat it does
/helpOpen the help overlay listing all visible commands
/clearClear transcript and conversation history; keep todos and system prompt
/quitExit caliban (/exit is an alias)
/resume[query]List persisted sessions (optional name substring filter)
/init[--force]Generate CLAUDE.draft.md from AGENTS.md / .cursorrules / git status
/model[id]Show or switch the active model (same-provider swap in v1)
/effort<level>Set reasoning effort: low, medium, high, max, or auto
/usageShow token usage and cumulative cost for this session
/costShow cumulative USD spend with per-model breakdown
/contextShow context-window utilization + top-N largest blocks
/compactTrigger the configured compactor to summarize history
/configOpen the configuration overlay (merged settings + scope chain)
/mcpOpen the MCP server status overlay
/hooksList configured hooks per event
/pluginsList installed plugins with enable/disable status
/permissionsOpen the permissions overlay; cycle mode with Tab, delete rule with d
/rewindOpen the checkpoint picker (also: Esc Esc on empty input)
/recapSummarize the conversation without mutating history
/btw<question>One-shot ephemeral side query to a fast model; result inlined
/export[path] [--format json]Export session transcript to markdown (or JSON)
/doctor[--deep]Run health checks: settings, MCP, skills, hooks, provider auth
/statusShow provider and auth status
/statuslineInspect the active custom status-line configuration
/loop[--n=N] [--interval=S]Plan repeated turns (execution bounded by --max-turns)

Full reference

The complete, up-to-date slash command index — including plugin-supplied commands and hidden aliases — lives in Slash Command Index. The index is generated from the live registry so it always reflects what is actually registered in your build.

Adding your own slash commands

Custom slash commands are defined as skills or plugins. See Custom Slash Commands for the authoring guide.

Supported Providers

Caliban is provider-agnostic: you choose which AI provider and model to use at runtime, and the same agent loop, tool engine, and permission system work regardless of which backend answers the requests.

Provider table

Provider--provider valueTransport / accessNotes
AnthropicanthropicDirect HTTPS (api.anthropic.com)Default provider
Anthropic via Bedrock(router only)AWS Bedrock (bedrock-runtime.*)Requires caliban-provider-bedrock; configured via caliban.toml
Anthropic via Vertex(router only)Google Vertex AIRequires caliban-provider-vertex; configured via caliban.toml
OpenAIopenaiDirect HTTPS (api.openai.com/v1)
OpenAI via Azure(router only)Azure OpenAI Serviceazure feature flag on caliban-provider-openai; configured via caliban.toml
GooglegoogleGoogle AI Studio (generativelanguage.googleapis.com)Gemini models
Google via Vertex(router only)Google Vertex AIvertex feature flag; configured via caliban.toml
OllamaollamaLocal HTTP (http://localhost:11434)No API key required

Bedrock, Vertex, and Azure transports are enabled by Cargo feature flags at build time. Binary distributions built by the project team include all features; self-compiled builds must enable the relevant feature (e.g. --features bedrock). These transports can only be selected through the model router — they are not available via the --provider CLI flag.

Capability matrix

ProviderTool useVisionThinkingPrompt caching
AnthropicParallelYesYesExplicit (up to 4 breakpoints)
BedrockParallelYesYesExplicit (mirrors Anthropic)
Vertex (Anthropic)ParallelYesYesExplicit (mirrors Anthropic)
OpenAIParallelYesYes (o-series)Automatic
Azure OpenAIParallelYesYes (o-series)Automatic
Google AI StudioParallelYesNoNone
Google VertexParallelYesNoNone
OllamaBasicModel-dependentModel-dependentNone

Ollama is local

Ollama runs models on your own machine. No API key, no network traffic, no per-token cost. Ideal for fast-classifier routes, offline use, or privacy-sensitive workloads. Capability varies by the specific model you pull.

Multiple providers at once

The model router lets you combine providers: for example, route main-loop turns through Anthropic while using a local Ollama model for fast classification. Each route gets its own provider, model, and resilience policy.

Configuring Providers & API Keys

Caliban needs to know which provider to use and how to authenticate with it. Provider selection happens on the command line; authentication is supplied via environment variables or a dynamic key helper.

Selecting a provider

Pass --provider to select the backend for a session:

caliban --provider anthropic   # default
caliban --provider openai
caliban --provider google
caliban --provider ollama      # no API key needed

When --provider is omitted, caliban resolves the provider from settings.model (see Model Selection), falling back to anthropic.

API key environment variables

Each provider reads its key from a well-known environment variable:

ProviderRequired env varOptional env vars
AnthropicANTHROPIC_API_KEYANTHROPIC_BASE_URL, ANTHROPIC_VERSION
OpenAIOPENAI_API_KEYOPENAI_BASE_URL, OPENAI_ORG_ID, OPENAI_PROJECT
GoogleGEMINI_API_KEYGOOGLE_GEMINI_API_KEY (alias), GEMINI_BASE_URL, GEMINI_API_VERSION
Ollama(none)OLLAMA_BASE_URL (default: http://localhost:11434)
Azure OpenAIAZURE_OPENAI_API_KEY, AZURE_OPENAI_RESOURCEAZURE_OPENAI_API_VERSION (default: 2024-10-21)

Set the variable in your shell profile or pass it inline:

export ANTHROPIC_API_KEY="sk-ant-..."
caliban "summarize this file"

Dynamic key helper (api_key_helper)

For secrets stored in a keychain, vault, or SSO-backed credential store, set api_key_helper in your settings file instead of exposing keys in the environment. The helper is a process caliban spawns to retrieve the current key on demand.

Forms

Bare string — a single executable path or command string, used for all providers:

api_key_helper = "/usr/local/bin/get-caliban-key"

Object — one helper with explicit options:

[api_key_helper]
command = "/usr/local/bin/get-caliban-key"
provider = "anthropic"        # omit for wildcard ("*")
refreshIntervalMs = 300000    # 5 minutes (default)
slowHelperWarningMs = 10000   # warn if script takes > 10 s (default)

Array — different helpers per provider, with a wildcard fallback:

[[api_key_helper]]
provider = "anthropic"
command = "/usr/local/bin/anthropic-key"

[[api_key_helper]]
provider = "*"
command = "/usr/local/bin/generic-key"

The helper receives two environment variables:

  • CALIBAN_PROVIDER — the provider id (e.g. anthropic)
  • CALIBAN_API_KEY_HELPER_TTL_MS — the configured refresh interval in milliseconds

It must print the API key to stdout (trailing newline is stripped) and exit 0. Any non-zero exit is treated as an error.

Caching and refresh

Caliban caches the returned key in memory for refreshIntervalMs (default 5 minutes). On a 401 or 403 from the provider, the cache entry is invalidated and the helper is re-invoked immediately for a fresh key. Override the TTL globally with CALIBAN_API_KEY_HELPER_TTL_MS.

Keyring integration

A one-liner shell wrapper around security find-generic-password (macOS) or secret-tool lookup (Linux/GNOME) makes api_key_helper work with the OS keychain without storing the key in any file.

Bedrock and Vertex configuration

AWS Bedrock and Google Vertex are configured through the model router using [provider.bedrock] and [provider.vertex] blocks in caliban.toml. Authentication follows each platform's standard credential chain:

  • Bedrock — AWS credential chain (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY, instance profiles, ~/.aws/credentials). A background task refreshes credentials on a configurable interval (default 5 minutes).
  • Vertex (Anthropic) — Google Application Default Credentials (GOOGLE_APPLICATION_CREDENTIALS, gcloud auth application-default login).
  • Vertex (Google) — same GCP ADC path as the Anthropic Vertex transport.

See The Model Router for the full caliban.toml syntax, including [provider.X] blocks that let you override the env var name or base URL per provider.

For a full listing of every setting key, see Settings Reference.

Model Selection

Caliban lets you choose the exact model at the command line, in settings, or via the model router. When multiple sources specify a model, a clear precedence chain resolves the winner.

Selecting a model at the command line

Use --model to name the model you want:

caliban --model claude-opus-4-7 "write a haiku"
caliban --provider openai --model gpt-5.5 "explain monads"
caliban --provider google --model gemini-2.0-flash "summarize this"
caliban --provider ollama --model qwen3.5:9b "local inference"

Per-provider defaults

When --model is omitted and no model is set in settings, caliban uses a built-in default for the chosen provider:

ProviderDefault model
anthropicclaude-sonnet-4-6
openaigpt-5.5
googlegemini-2.0-flash
ollamallama3.1

Setting a model in settings

Set model in your project or user settings file to avoid repeating --model on every invocation. Two forms are accepted:

Bare string — the provider is inferred from the model name resolution or --provider:

model = "claude-sonnet-4-6"

Qualified object — explicitly names both the provider and the model:

[model]
provider = "anthropic"
name = "claude-sonnet-4-6"

The qualified form is the safest option in shared project configs because it makes the intended provider unambiguous.

You can also set a fallback_model that caliban uses when the primary model errors:

[model]
provider = "anthropic"
name = "claude-opus-4-7"

[fallback_model]
provider = "anthropic"
name = "claude-sonnet-4-6"

Fallback model (--fallback-model)

Pass --fallback-model on the command line to override the settings fallback for a single run:

caliban --model claude-opus-4-7 --fallback-model claude-sonnet-4-6 "long task"

The fallback is wired through caliban-model-router (ADR 0038) and is also surfaced in the headless system/init frame.

Per-turn limits

Control token usage and sampling with these flags:

FlagDefaultDescription
--max-tokens N8192Per-turn output token limit. Must be ≥ 1.
--temperature F(provider default)Sampling temperature in [0.0, 2.0]. Values outside this range are rejected at startup.
caliban --max-tokens 8192 --temperature 0.2 "write a long essay"

Per-purpose model overrides (model_overrides)

For finer-grained control without a full router config, set model_overrides in settings to pin specific request purposes to a particular model string:

[model_overrides]
fast-classifier = "claude-haiku-4-5"
summarization = "claude-haiku-4-5"

The keys must match the purpose slugs understood by the router (main_loop, summarization, fast_classifier, sub_agent, embedding). This setting does not support cross-provider routing; use the model router for that.

Precedence

When multiple sources specify a model, this chain resolves the winner (highest priority first):

flowchart LR
    A["CLI<br/>--model / --provider"] --> B["settings.model<br/>(project > user > managed)"]
    B --> C["Provider default<br/>(built-in table)"]
  1. CLI flags (--model, --provider) — always win.
  2. settings.model — merged across the settings scope chain (project > user > managed).
  3. Provider built-in default — the per-provider fallback in the table above.

For the most flexible per-purpose routing, see The Model Router.

The Model Router

Advanced / optional feature

The model router is an optional layer. If you only need a single provider and model, the --provider and --model flags are all you need. This chapter is relevant when you want per-purpose model dispatch, fallback chains, hedging, or circuit breakers.

The model router is a purpose-keyed dispatcher that sits between the agent loop and your provider adapters. It lets you assign different models — from the same or different providers — to different kinds of requests, and adds resilience features (fallback, hedging, circuit breakers) on top.

Why a router?

The agent makes provider calls for several distinct purposes: the main conversational loop, summarization for compaction, fast classification for permission decisions, sub-agent loops, and more. The router lets you express policies like:

  • Use Claude Opus for main-loop turns, Claude Haiku for summarization.
  • Route fast classification to a local Ollama model (zero API cost, low latency).
  • Fall back from Anthropic to OpenAI if Anthropic returns a rate-limit error.

Request purposes

Each internal request carries a purpose that the router uses for dispatch:

PurposeSlugDescription
Main loopmain_loopPrimary conversational turns
SummarizationsummarizationContext compaction summaries
Fast classifierfast_classifierAuto-mode permission decisions
Sub-agentsub_agentSpawned sub-agent loops
EmbeddingembeddingEmbedding / memory retrieval
OtherotherRequests that don't fit a category

Enabling the router

Drop a caliban.toml file in your project root. Caliban discovers it by walking up from the current directory to the nearest git root or $HOME, then falls back to ~/.config/caliban/caliban.toml. You can also point directly to a file:

caliban --config /path/to/caliban.toml "my prompt"
# or via env var:
CALIBAN_ROUTER_CONFIG=/path/to/caliban.toml caliban "my prompt"

Discovery order (highest priority first): --config flag → CALIBAN_ROUTER_CONFIG → walk-up from current directory → ~/.config/caliban/caliban.toml.

Basic configuration

A minimal caliban.toml with two purpose-keyed routes:

[router]
default_purpose = "main_loop"

[[router.route]]
purpose = "main_loop"
provider = "anthropic"
model = "claude-opus-4-7"

[[router.route]]
purpose = "fast_classifier"
provider = "ollama"
model = "llama3.2:3b"

Valid provider values: anthropic, openai, google, ollama.

Provider blocks

Override the API key env var or base URL for a provider in caliban.toml:

[provider.openai]
api_key_env = "OPENAI_API_KEY_STAGING"
base_url = "https://oai-staging.example.com/v1"

[provider.ollama]
base_url = "http://gpu-server.local:11434"

Fallback chains

When a route fails with a retriable error (rate-limit, model unavailable, network timeout, server error), the router tries the next route for the same purpose. Define an explicit ordered fallback list, or let declaration order in the file serve as the implicit chain:

[[router.route]]
id = "main-primary"
purpose = "main_loop"
provider = "anthropic"
model = "claude-opus-4-7"
fallback = ["main-fallback"]    # explicit: only try this specific route next

[[router.route]]
id = "main-fallback"
purpose = "main_loop"
provider = "openai"
model = "gpt-5.5"

Set fallback = [] to disable fallback entirely for a route.

Errors that are not retriable (auth failure, content policy, invalid request, cancellation) propagate immediately without trying another route.

Hedging

Hedging races a second route against the primary after a configurable delay. The first to respond wins; the other is cancelled. This is a spend-for-latency trade-off and must be opted in explicitly:

[[router.route]]
purpose = "main_loop"
provider = "anthropic"
model = "claude-sonnet-4-6"
hedge = { hedge_after_ms = 1000, max = 1 }

A global default applies to all routes in the file:

[router.hedge]
hedge_after_ms = 1500
max_hedges = 1

Set hedge = false on a route to disable the global default for that route.

Hedging doubles costs

Every hedged request that wins incurs a full charge on the winning route and a partial charge on the losing route for tokens sent before cancellation. Enable hedging only on routes where the latency benefit justifies the extra spend.

Circuit breakers

A circuit breaker tracks failures per route and temporarily stops routing to a route that is consistently failing. Once the cool-off window passes, the breaker enters a half-open state and probes the route before fully reopening.

[router.breaker]           # global defaults
failure_threshold = 5      # trip after 5 failures within the window
window_secs = 60
cooldown_secs = 30
half_open_probes = 1

[[router.route]]
purpose = "main_loop"
provider = "anthropic"
model = "claude-sonnet-4-6"
breaker = false            # disable the global breaker for this route

Per-route breaker overrides can supply any subset of the fields; the rest inherit the global defaults. Cancellation outcomes do not count as failures.

Capability filters

Routes can declare capability requirements. The router only sends a request to a route if the request's needs satisfy the route's declared capabilities:

[[router.route]]
purpose = "main_loop"
provider = "anthropic"
model = "claude-sonnet-4-6"
requires = { vision = true, tool_use = true }

The router also derives needs automatically from the request content (image blocks → vision need, tool declarations → tool-use need, thinking budget → thinking need), so you do not need to annotate every route manually.

Effort levels

Set a default effort level on a route and optionally map each level to a provider-specific knob string:

[[router.route]]
purpose = "main_loop"
provider = "anthropic"
model = "claude-sonnet-4-6"
effort = "medium"

[router.route.effort_map]
low    = "budget=1024"
medium = "budget=8192"
high   = "budget=32768"

Valid effort levels: low, medium (default), high. Callers that don't specify an effort level inherit the route's default; the route default falls back to medium.

Diagnosing the router

Use caliban router debug to print the candidate list the router would resolve for a synthetic request, including breaker state and effort knobs:

# Default: main_loop purpose, no special needs
caliban router debug

# Simulate a vision + tool request
caliban router debug --purpose main_loop --has-vision --has-tools

# Show the effort table for a high-effort request
caliban router debug --effort high

# Point at a specific config file
caliban --config ./caliban.toml router debug --purpose summarization

The output shows each route with a + (kept) or - (dropped) marker, the reason it was kept or dropped, and the current circuit-breaker state.

flowchart LR
    R["Request\n(purpose + needs)"] --> Res["Resolve candidates\n(purpose filter →\ncapability filter →\nbreaker filter)"]
    Res --> D{"Dispatch"}
    D -- "success" --> Resp["Response"]
    D -- "retriable error" --> F["Next candidate\n(fallback chain)"]
    F --> D
    D -- "hedge delay" --> H["Hedge race\n(first wins)"]
    H --> Resp

Settings Layering

Caliban merges configuration from up to five sources before starting. Knowing the merge order lets you predict which value wins when the same key appears in multiple places.

The five scopes

PriorityScopeDescription
1 (highest)CLI--settings <FILE|JSON> overlay injected above local
2Local.caliban/settings.local.toml in the workspace
3Project.caliban/settings.toml in the workspace
4UserOS user-config directory (see File Locations)
5 (lowest)ManagedSystem-wide directory set by an operator

Higher priority always wins for scalar values. The CLI scope is a virtual overlay — it has no on-disk file.

flowchart LR
    M["Managed\n(lowest)"] --> U["User"]
    U --> P["Project"]
    P --> L["Local"]
    L --> C["CLI --settings\n(highest)"]
    C --> EFF["Effective\nSettings"]

Deep-merge semantics

Scalars use highest-wins: the value from the highest-priority scope that defines the key is used; lower scopes are ignored for that key.

Arrays and maps have richer rules:

Key(s)Merge behaviour
permissions.allow, .ask, .denyConcatenated in priority order (lower scopes first, higher appended); duplicates dropped
permissions.rulesConcatenated in priority order; source order within each scope is preserved
hooks.<Event>Concatenated
mcp_servers.<name>Deep-merged per server; a project scope can add an env key to a user-scope server without redefining the whole entry
envDeep-merged (highest-priority value wins per key)
additional_directories, claude_md_excludesConcatenated
Everything elseHighest-wins scalar

The --settings CLI overlay

--settings injects a virtual scope that sits above Local but below any active managed-block. It accepts either an inline JSON object or a path to a .json or .toml file:

# inline JSON
caliban --settings '{"model": "claude-opus-4-7"}'

# file path
caliban --settings /tmp/ci-overrides.toml

This is the recommended way to supply CI-specific settings without touching scope files.

parent_settings_behavior — managed lockdown

When an operator sets parent_settings_behavior = "block" in the managed scope, the merge order flips: the managed scope moves to the top of the chain and overrides every other scope, including the CLI overlay.

# /Library/Application Support/Caliban/managed-settings.toml
parent_settings_behavior = "block"
model = "claude-haiku-4-7"

With "block" active, users cannot override model from their own settings or from --settings. The value "augment" is the default behaviour (managed sits at the bottom).

Enterprise lockdown

When parent_settings_behavior = "block" is set in the managed scope, all user, project, local, and CLI settings for locked keys are ignored. The effective values come exclusively from the managed scope for those keys.

--setting-sources — scope filtering

--setting-sources restricts which on-disk scopes are loaded. It accepts a comma-separated list of scope names: managed, user, project, local. The CLI overlay is always applied regardless of this flag.

# Load only user + project scopes (skip local overrides)
caliban --setting-sources user,project

# Pin to project scope only — useful for reproducible CI runs
caliban --setting-sources project

An unknown scope name is a fatal error (exit 78) rather than a silent no-op.

Live reload

A file watcher monitors each scope's path with a 250 ms debounce. When a file changes, caliban re-loads and re-merges all scopes atomically and fires a ConfigChange hook event. Most keys take effect immediately:

  • Live-reloadable: permissions.*, hooks.*, api_key_helper.*, output_style, editor_mode, view_mode, statusLine, env, memory, additional_directories, claude_md_excludes
  • Restart-required: model, fallback_model, mcp_servers.*, auto_compact_threshold, micro_compact_enabled

Restart-required keys log a WARN on change and take effect on the next caliban invocation. The /config TUI overlay shows a "restart required" badge next to changed restart-required keys.

Inspecting the effective result

Run caliban config print to see the fully-merged settings with per-key scope annotations, without starting a session.

File Locations

Caliban resolves settings files from four on-disk scopes. This page lists the canonical path for each scope on each supported OS.

Scope paths

Managed scope

Set by a system administrator. Caliban reads but never writes this directory.

OSPath
macOS/Library/Application Support/Caliban/managed-settings.toml
Linux/etc/caliban/managed-settings.toml
WindowsC:\ProgramData\Caliban\managed-settings.toml

The JSON equivalent (managed-settings.json) is accepted on read as a legacy path but triggers a WARN on startup.

User scope

Per-user settings that apply across all projects. Caliban uses the standard OS user-configuration directory (the parent of caliban/) resolved via the dirs crate.

OSPath
macOS~/Library/Application Support/caliban/settings.toml
Linux~/.config/caliban/settings.toml (or $XDG_CONFIG_HOME/caliban/settings.toml)
Windows%APPDATA%\caliban\settings.toml

Project scope

Committed alongside your code. This file should be checked into version control and shared with your team.

OSPath
All<workspace>/.caliban/settings.toml

Local scope

Machine-local overrides that should not be committed. Add .caliban/settings.local.toml to your .gitignore.

OSPath
All<workspace>/.caliban/settings.local.toml

Per-feature files (legacy)

Caliban still loads standalone per-feature TOML files during the current compatibility window. They are consulted only when the corresponding key is absent from the unified settings file in the same scope directory.

FileKey governedNotes
.caliban/permissions.tomlpermissionsCan also coexist alongside settings.toml; its permissions block overrides the permissions key in settings.toml for that scope
.caliban/mcp.tomlmcp_serversLegacy transport key is transport; canonical key is type
.caliban/hooks.tomlhooks, disable_all_hooks, allow_managed_hooks_only, allowed_http_hook_urls, http_hook_allowed_env_vars

Deprecation timeline

Per-feature TOML files are deprecated. Caliban logs a WARN when it falls back to them. After two minor releases the warning becomes an error. Run caliban config migrate to consolidate them into a single settings.toml.

TOML vs JSON

TOML is the canonical write format. JSON is accepted on read as a legacy/import path:

  • When both settings.toml and settings.json exist in the same scope directory, .toml wins and caliban logs a WARN about the ignored .json file.
  • When only settings.json exists, caliban loads it with a WARN recommending migration.
  • Caliban's own write paths (modal, caliban perms add, /permissions editor) always emit TOML.

Atomic writes

All caliban-owned writes use an atomic flock + temp-file rename pattern:

  1. A sibling .settings.toml.lock file is exclusively flocked.
  2. Content is written to a uniquely-named .toml.tmp.<pid>.<tid> file.
  3. The temp file is synced and renamed onto the target.
  4. The lock is released.

This ensures concurrent writers (e.g. two terminal sessions) never produce a corrupted file.

Consolidated path reference

See Files & Directories for the full list of all caliban-managed paths including sessions, cache, logs, and debug output.

Settings Reference

Every key recognized by settings.toml (and its JSON equivalent) is listed below, grouped by topic. For merge semantics see Settings Layering; for file paths see File Locations.

All fields are optional. Unknown top-level keys are tolerated for forward-compatibility (they are collected and ignored rather than causing a parse error).


Model / Agent

KeyTypeDefaultDescription
agentstringAgent profile name used as a sub-agent dispatch hint
modelstring or { provider, name }provider defaultPrimary model. Bare string (e.g. "claude-sonnet-4-7") or qualified object (e.g. { provider = "anthropic", name = "claude-sonnet-4-7" }). CLI --model / --provider override this
fallback_modelstring or { provider, name }Model used when the primary returns an error. Wired through caliban-model-router. CLI --fallback-model overrides this
model_overrides{ route → model }{}Per-named-route model overrides passed to the router (e.g. { "fast-classifier" = "claude-haiku-4-7" })

For provider and model selection details see Model Selection and The Model Router.


Permissions

KeyTypeDefaultDescription
permissions.allowstring[][]Patterns that auto-allow without prompting. Concatenated across scopes
permissions.askstring[][]Patterns that prompt the user. Concatenated across scopes
permissions.denystring[][]Patterns that hard-deny. Concatenated across scopes
permissions.rulesRuleSpec[][]v2 ordered rule array. When non-empty, takes precedence over the three-bucket form above. Source order is preserved; first match wins
permissions.enforceboolfalseWhen true, refuse --no-permissions / bypass mode at startup
permissions.default_modestring"default"Initial permission mode at session start. Valid values: default, acceptEdits, plan, auto, dontAsk, bypassPermissions
permissions.audit_logbooltrueEnable the append-only decision log

Each entry in permissions.rules supports:

FieldTypeRequiredDescription
patternstringyesGlob pattern matching Tool or Tool:first-arg-glob
action"allow" | "ask" | "deny"yesDecision when this rule matches
commentstringnoHuman-readable note shown in /permissions
reasonstringnoDeny reason surfaced to the operator and logged
expires_atISO-8601 datetimenoRule is skipped after this timestamp

See Permissions Concepts and Pattern Grammar for full detail.


Hooks

KeyTypeDefaultDescription
hooks{ event → handler[] }{}Hook event map. Keys are event names (e.g. "PreToolUse", "SessionEnd"); values are handler lists
disable_all_hooksboolfalseKill-switch that disables every external hook handler. In-process hooks (permissions, audit) still run
allow_managed_hooks_onlyboolfalseWhen true, only hooks defined in the managed scope fire
allowed_http_hook_urlsstring[][]Glob allowlist for HTTP hook endpoint URLs
http_hook_allowed_env_varsstring[][]Env-var names that HTTP hook handlers are allowed to read

See Hooks for the full event list and handler shapes.


MCP Servers

mcp_servers is a map of server name to server configuration. Each entry deep-merges across scopes so a project scope can add environment variables to a user-scope server without redefining the whole entry.

[mcp_servers.linear]
command = "npx"
args    = ["-y", "@linear/mcp-server"]

[mcp_servers.silverbullet]
type = "http"
url  = "https://mcp.example.com/mcp"
headers = { Authorization = "Bearer ${SB_TOKEN}" }
FieldTypeDefaultDescription
type"stdio" | "http" | "sse""stdio"Transport. Also accepted as transport (legacy alias)
commandstring""Executable (stdio only)
argsstring[][]Argv after command (stdio only)
env{ key → value }{}Environment variables (stdio only)
cwdstringWorking directory override (stdio only)
urlstringAbsolute HTTP/HTTPS URL (http/sse only)
headers{ key → value }{}Static request headers (http/sse only)
oauth"off" | "auto" | "manual""off"OAuth mode (http/sse only)
permissions.allowstring[][]Per-server allow list (composed with global rules)
permissions.denystring[][]Per-server deny list
disabledboolfalseSkip this server on startup

See MCP Servers for configuration examples and the OAuth flow.


Router

KeyTypeDefaultDescription
routerobjectOpaque config blob passed to caliban-model-router. The router crate owns the schema; see The Model Router

Memory

KeyTypeDefaultDescription
memoryobjectMemory tier knobs passed to caliban_memory::MemoryConfig. Sub-keys include auto_memory_enabled (bool), auto_memory_directory (string), cap_tokens_auto, cap_tokens_claude_md, cap_tokens_combined (integers)

See Memory Tiers and CLAUDE.md & Imports.


Plugins

KeyTypeDefaultDescription
pluginsobjectPlugin manager knobs. Schema is owned by the plugin subsystem; see Plugins

UI / Output

KeyTypeDefaultDescription
output_stylestring"default"Active output-style name. See Output Styles. Restart-required
editor_mode"vim" | "emacs"Input-line editing mode
view_modestringCompact vs. expanded TUI layout
statusLineobjectCustom statusline command. Also accepted as status_line (TOML-friendly alias)
tuiobjectTUI theme and layout knobs (e.g. showCostInStatusline)

statusLine sub-keys:

FieldTypeDefaultDescription
commandstringShell command whose stdout is used as the statusline text. Required
timeout_msintegerPer-invocation timeout in ms (50–5000)
paddingintegerHorizontal padding cells (0–8)

Authentication

KeyTypeDefaultDescription
api_key_helperstring, object, or object[]Provider API-key supplier. Three shapes: bare command string; single { command, provider, refreshIntervalMs, slowHelperWarningMs } object; or array of provider-keyed objects. Executed without a shell; cached for refreshIntervalMs (default 5 min) or until a 401 is received

Auth precedence per provider: per-provider helper → wildcard helper → environment variable → keyring → anonymous.

See Configuring Providers & API Keys.


Observability

KeyTypeDefaultDescription
enable_telemetryboolfalseEnable OpenTelemetry / cost emitter

See Telemetry & Cost.


Context-Window Management

KeyTypeDefaultDescription
auto_compact_thresholdfloat or null0.75Pre-turn auto-compaction threshold as a utilization fraction in [0, 1]. null disables auto-compact
micro_compact_enabledbooltrueEnable per-turn microcompact (LLM-free supersession pass)
tool_result_cap_charsinteger50000Global per-tool-result character cap. 0 disables
min_cache_block_tokensinteger1024Minimum estimated tokens on the last user message to place a conversation-level prompt-cache marker

See Context & Compaction.


Managed Scope Control

KeyTypeDefaultDescription
parent_settings_behavior"block" | "augment""augment"When "block" is set in the managed scope, the managed scope moves to the top of the merge chain, overriding all user, project, local, and CLI settings. Has no effect when set in other scopes

Miscellaneous

KeyTypeDefaultDescription
additional_directoriesstring[][]Extra workspace roots for file and shell tools to consider
claude_md_excludesstring[][]Glob patterns for CLAUDE.md paths to skip during discovery
env{ key → value }{}Environment-variable overrides applied to every child process launched by caliban (tools, hooks, MCP servers). Deep-merged across scopes; highest-priority scope wins per key

Config Commands

Caliban ships two subcommand families for inspecting and managing settings: caliban config for the unified settings layer, and caliban settings for import/export of individual scope files. Both work without a running session.

caliban config print

Prints the fully-merged effective settings as JSON, annotated with the scope each value came from. Honors --settings and --setting-sources so you can preview what a CI run or a different scope combination would see.

caliban config print

# Show only project + user scopes (skip local)
caliban --setting-sources user,project config print

# Preview with a CLI overlay applied
caliban --settings '{"model": "claude-opus-4-7"}' config print

The output shows the merged Settings object. Each top-level key lists the scope that contributed the winning value. This is the headless equivalent of the read-only Effective tab in the /config TUI overlay.

caliban config migrate

Consolidates legacy per-feature TOML files (permissions.toml, mcp.toml, hooks.toml) in the current workspace into a single .caliban/settings.toml. Existing keys in the target file are preserved; the migrated keys are merged on top.

# Preview what would be written (nothing is changed)
caliban config migrate --dry-run

# Run the migration
caliban config migrate

After migration the per-feature files are no longer read (caliban checks for the unified key first). You can safely delete them, or leave them in place — caliban will ignore them once the corresponding key exists in settings.toml.

When to migrate

Run caliban config migrate once after upgrading to a version that shipped ADR 0026. It is safe to run multiple times — the command is idempotent.

caliban settings import

Imports a settings file from a foreign format (Claude Code JSON, Codex JSON, or legacy caliban JSON) into canonical caliban TOML at the target scope.

# Import ~/.claude.json into the user scope (dry-run first)
caliban settings import --from ~/.claude.json --scope user --dry-run
caliban settings import --from ~/.claude.json --scope user

# Import a project settings file into the project scope
caliban settings import --from /path/to/settings.json

Options:

FlagDescription
--from <PATH>Path to the source file (required)
--scope <SCOPE>Destination scope: managed, user, project, or local. Default: project
--dry-runPrint what would be written without making changes

caliban settings import is the recommended migration path when you have an existing Claude Code settings.json you want to adopt. The source file is read-only; only the target scope's TOML is written.

caliban settings print

Prints the raw settings for a single scope (before merging), or the merged effective settings when no scope is specified.

# Print the project-scope settings
caliban settings print

# Print the user-scope settings
caliban settings print --scope user

Options:

FlagDescription
--scope <SCOPE>Scope to print. Default: project

This differs from caliban config print in that it shows the unmerged raw contents of one scope rather than the merged result across all scopes.


TOML-primary write / JSON import-only

Caliban always writes TOML. JSON files at any scope path are accepted on read as a legacy or import path, but caliban logs a WARN and recommends running caliban settings import to migrate.

When both settings.toml and settings.json exist in the same scope directory, TOML wins and the JSON file is ignored (with a WARN).

Do not hand-edit JSON if you also have TOML

If caliban finds both settings.toml and settings.json in the same scope directory it will silently ignore the .json file. Keep one format per scope directory.


Live reload and restart-required keys

Most settings changes take effect immediately via the file watcher (250 ms debounce). A subset of keys require a full restart:

  • Restart-required: model, fallback_model, mcp_servers.*, output_style, auto_compact_threshold, micro_compact_enabled

When a restart-required key changes on disk while caliban is running, caliban logs a WARN and shows a "restart required" badge in the /config TUI overlay. The new value will be used the next time you launch caliban.

All other settings — permissions, hooks, api_key_helper, UI keys, env, memory knobs — are live-reloadable and take effect within one debounce cycle without restarting.

Concepts

Every tool call the model makes — Bash, Write, Edit, a fetched URL, an MCP action — passes through the permission system before it executes. The system is a flat list of rules evaluated from top to bottom; the first rule that matches determines the outcome.

Actions

Each rule maps a pattern to one of three actions:

ActionMeaning
allowExecute the tool call immediately, no prompt.
denyReject the tool call. An optional reason string is returned to the model so it can retry differently.
askPause and ask the operator interactively. In headless mode without --auto-allow, ask degrades to a hard deny.

Rule structure

A rule is a TOML table with a pattern and an action, plus optional metadata:

[[permissions.rules]]
pattern  = "Bash:git *"       # required — see Pattern Grammar
action   = "allow"            # required — allow | deny | ask
comment  = "git ops are fine" # optional — shown in the Ask modal, not to the model
reason   = "…"                # optional, deny-only — returned to the model
expires_at = "2027-01-01T00:00:00Z"  # reserved, parsed but not yet enforced

Evaluation order

Rules are evaluated top-to-bottom; first match wins.

Sources are merged in priority order before evaluation, so high-priority sources simply appear earlier in the flat list:

  1. CLI flags (--allow, --deny, --ask) — highest priority; prepended at startup.
  2. Project file<workspace>/.caliban/permissions.toml.
  3. User file$XDG_CONFIG_HOME/caliban/permissions.toml (default: ~/.config/caliban/permissions.toml).
  4. Built-in defaults — lowest priority; appended automatically.

Within a single file, rules are ordered exactly as written. The [[permissions.rules]] array preserves authoring order, so narrow rules belong above broader ones.

Legacy three-bucket compat

Older configs used a permissions.{allow,ask,deny} key per action rather than an ordered array. Caliban still loads that format on read and normalizes it into the ordered array, but all caliban-owned writes (the Ask modal, /permissions, caliban perms add) emit the canonical [[permissions.rules]] form. Convert your config with caliban perms export --format toml.

Built-in defaults

When no rule matches a tool call before the end of the list, the built-in defaults serve as a safety net:

PatternDefault action
Readallow
Grepallow
Globallow
TodoWriteallow
EnterPlanModeallow
ExitPlanModeallow
WebFetchask
Bashask
Writeask
Editask
* (catch-all)ask

Unknown tools (MCP tools, future built-ins) fall through to the * catch-all and are ask by default.

Decision flow

flowchart TD
    A([Tool call arrives]) --> B{Runtime rules\nany match?}
    B -- yes --> Z
    B -- no --> C{CLI rules\n--allow/--deny/--ask}
    C -- match --> Z
    C -- no match --> D{Project rules\npermissions.toml}
    D -- match --> Z
    D -- no match --> E{User rules\n~/.config/caliban/}
    E -- match --> Z
    E -- no match --> F{Built-in defaults}
    F -- match --> Z
    Z[Matched rule action] --> G{Action?}
    G -- allow --> H([Execute tool])
    G -- deny --> I([Reject — return reason to model])
    G -- ask --> J{Interactive session?}
    J -- yes --> K([Show Ask modal])
    J -- no --> L{--auto-allow?}
    L -- yes --> H
    L -- no --> I

The Permission Mode wraps this pipeline and can override the ask verdict (but never a static allow or deny). See Permission Modes for details.

Pattern Grammar

A pattern is the pattern field in a [[permissions.rules]] entry (or the argument to --allow/--deny/--ask on the CLI). It encodes the tool name and an optional argument specifier separated by a colon.

Forms at a glance

FormDescription
ToolMatch any invocation of Tool, regardless of arguments.
Tool:<glob>Match Tool when its first argument matches <glob>.
Bash:~<glob>Match Bash when <glob> appears anywhere in the command string.
Tool:key=<glob>Match Tool when the named input field matches <glob> (dotted keys supported).
Tool:k1=<g1>,k2=<g2>Multiple key=glob pairs, AND-combined.
*Catch-all — matches every tool.

Glob characters

The argument-side glob uses globset semantics:

CharacterMeaning
*Zero or more characters (does not cross / in path patterns).
**Zero or more path segments (crosses /; use in file-edit patterns).
?Exactly one character.

Non-path patterns (Bash command strings, URLs, MCP string fields) use literal_separator = false, so * matches slashes too.

Tool:<glob> — first-argument matching

The "first arg" is a per-tool field extracted from the JSON input:

ToolFirst-arg field
Bashcommand
Read, Write, Edit, MultiEdit, NotebookEditpath
WebFetchurl
MCP tools with no known accessor(no first arg; pattern can't match)

If the tool has no known accessor, only the bare Tool form can match; Tool:<glob> never fires for that tool.

Bash:~<glob> — anywhere-in-command match

Prefix the argument glob with ~ to perform a sliding-window search over the full command string rather than matching from the start. This catches commands invoked via wrappers or subshells:

# Deny any use of rm, even via sudo or bash -c "rm …"
[[permissions.rules]]
pattern = "Bash:~rm *"
action  = "deny"
reason  = "no rm — use git revert or Write"

The ~ prefix is only meaningful for Bash. On other tools it does not match.

Tool:key=<glob> — structured (dotted-key) matching

For MCP tools or built-ins whose input has named fields, use key=glob to match a specific field. Dots traverse nested objects:

# Allow creating GitHub issues only in the anthropic org
[[permissions.rules]]
pattern = "mcp__github__create_issue:repo=anthropic/*"
action  = "allow"

# AND-combined: repo must match AND title must start with "feat"
[[permissions.rules]]
pattern = "mcp__github__create_issue:repo=anthropic/*,title=feat*"
action  = "allow"

File-edit path normalization

For Read, Write, Edit, MultiEdit, and NotebookEdit, the file path in the tool call is workspace-normalized before pattern matching:

  • Absolute paths are used as-is.
  • Relative paths are resolved against the workspace root (the git rev-parse --show-toplevel result, or the current working directory when outside a repo).
  • A relative pattern like src/**/*.rs is automatically anchored with **/ so it matches at any depth under the repo.
# Allow editing any Markdown file anywhere in the repo
[[permissions.rules]]
pattern = "Edit:**/*.md"
action  = "allow"

# Allow editing files only in a specific directory (absolute path)
[[permissions.rules]]
pattern = "Write:/tmp/*"
action  = "allow"

Examples table

PatternMatchesDoes not match
BashAny Bash call
Bash:git *git push, git commit -m "…"gitk, sudo git push
Bash:~git *sudo git push, bash -c "git fetch"commands with no git substring
Bash:rm *rm -rf /tmpsudo rm -rf /tmp (use ~rm * for that)
Edit:**/*.rs/repo/src/main.rs, /repo/crates/x/lib.rs/tmp/scratch.py
Write:/tmp/*/tmp/out.txt/home/user/file.txt
WebFetch:https://docs.*https://docs.rs/…, https://docs.anthropic.com/…https://api.example.com/…
mcp__gh__create_issue:repo=acme/*{"repo":"acme/frontend"}{"repo":"other/repo"}
*Every tool

Unknown MCP tools

MCP tools that declare no known first-arg accessor can only be matched by their full name (mcp__server__tool_name) or the * catch-all. A pattern like mcp__server__tool_name:<glob> will never fire for such tools because there is no field to extract.

Permission Modes

Permission modes control what happens when the rule evaluator produces an ask verdict. They do not override a static allow or deny — those always win. A mode is just a post-pass filter on top of the rule pipeline.

The six modes

ModecamelCaseWhat changes
DefaultdefaultRules apply unchanged; ask routes to the interactive Ask modal.
Accept EditsacceptEditsWrite, Edit, MultiEdit, and NotebookEdit are auto-allowed; all other tools honor rules normally.
PlanplanRead-only tools are allowed; write and execute tools are blocked from the loop (legacy plan-mode allowlist).
AutoautoA fast classifier model labels each ask-rule tool call as allow / soft-deny / hard-deny. Soft-deny routes to the Ask modal with the classifier's reason.
Don't AskdontAskEvery ask verdict becomes allow. Static deny rules still apply.
Bypass PermissionsbypassPermissionsAll rules ignored — every tool call is allowed. Requires an explicit confirmation flag (see below).

The status bar shows a chip when the active mode is not default:

ModeChip
acceptEdits✎ accept edits
plan📋 plan
auto🤖 auto
dontAsk⏭ don't ask
bypassPermissions⚠ bypass

Cycling modes with Shift+Tab

In the interactive TUI, press Shift+Tab to cycle forward through the modes:

default → acceptEdits → plan → auto → dontAsk → bypassPermissions → default

Cycling into bypassPermissions without the confirmation flag (see below) fires a warning toast and snaps back to default.

Setting the mode at startup

Use --permission-mode on the command line:

caliban --permission-mode acceptEdits "add docstrings to all public functions"

Valid values are the camelCase mode names: default, acceptEdits, plan, auto, dontAsk, bypassPermissions.

The mode is also resolved from the environment variable CALIBAN_DEFAULT_PERMISSION_MODE and the permissions.default_mode setting (see below), with this precedence:

  1. --permission-mode CLI flag
  2. CALIBAN_DEFAULT_PERMISSION_MODE env var
  3. permissions.default_mode in settings
  4. Built-in default (default)

The default_mode setting

Set a persistent default mode in your project or user settings file:

[permissions]
default_mode = "acceptEdits"

This is overridden by the CLI flag and env var as shown above.

Auto-mode and --disable-auto-mode

When the mode is auto, the classifier is consulted for each tool call whose rule verdict is ask. The classifier dispatches via the router's FastClassifier purpose — configure it to use a small, fast model (e.g., Haiku, GPT-4o-mini, a local Ollama model). Results are cached for the session by (tool_name, sha256(input)).

To disable the classifier (all ask verdicts stay as-is, routing to the modal), pass:

caliban --disable-auto-mode

or set CALIBAN_DISABLE_AUTO_MODE=1. When disabled, auto mode behaves identically to default.

Bypass permissions latch

bypassPermissions overrides all rules, including static deny. Because this is a footgun, caliban refuses to enter the mode without an explicit confirmation flag:

caliban --allow-dangerously-skip-permissions --permission-mode bypassPermissions

Without --allow-dangerously-skip-permissions:

  • Starting with --permission-mode bypassPermissions aborts at startup with an error.
  • Configuring permissions.default_mode = "bypassPermissions" also aborts at startup.
  • Cycling to bypassPermissions via Shift+Tab fires a warning toast and reverts to default.

Bypass is not for routine use

In bypass mode the model can execute any tool call without restriction. Use it only in fully sandboxed, disposable environments where you control the entire execution context. Prefer dontAsk or acceptEdits for typical automation.

--no-permissions

--no-permissions disables the permission system entirely — no rules are evaluated and every tool call is allowed. It conflicts with --allow, --deny, --ask, and --auto-allow. The resolved mode surfaces as "disabled" in the system/init stream-json frame.

Managing Rules

Rules can be created and edited through three surfaces: the interactive Ask modal that appears when a tool call reaches an ask verdict, the /permissions overlay inside the TUI, and the caliban perms CLI for scripted or headless management.

The Ask modal

When a tool call hits an ask verdict during an interactive session, the TUI pauses and presents a modal with four choices (navigate with arrow keys, confirm with Enter):

ChoiceEffect
Allow oncePermit this specific tool call and continue; no rule is written.
Always allowPermit this call and append a new allow rule to the chosen scope file.
Reject onceDeny this specific call; no rule is written.
Always rejectDeny this call and append a new deny rule to the chosen scope file.

Press Esc to dismiss the modal and deny the current call without writing any rule.

When you choose "Always allow" or "Always reject", caliban opens a sub-prompt with a suggested narrow pattern (e.g., Bash:git push rather than Bash), a scope picker (project / user), and an optional comment field. The rule is atomically appended to the appropriate TOML file and takes effect immediately for the rest of the session.

The /permissions overlay

Type /permissions in the TUI input bar to open the interactive permissions overlay. It shows:

  • The full effective rule list (runtime rules, then config rules by scope, then built-in defaults), each tagged with its origin.
  • Runtime-only rules added by "Always allow/reject" during this session.
  • Keybind d deletes the selected rule: a session rule is dropped from the live store immediately; a file-scoped rule is removed from its TOML file; built-in defaults are read-only.

Use the overlay to inspect the live rule list and verify which rule would match a given tool call before running it.

Live vs. persisted changes

Adding a rule through the Ask modal applies to the running session immediately — the next matching tool call won't re-prompt. Removing a file-scoped rule (via the overlay's d key or caliban perms remove), or editing a permissions.toml outside caliban, does not retroactively change the current session's decisions; those changes take effect at the next session start. Deleting a session rule with d is the exception — it takes effect live.

caliban perms CLI

The caliban perms subcommand provides a complete headless management surface. All verbs accept an optional --scope flag (managed | user | project | local | cli; defaults vary by verb).

list — show rules

# Show the effective merged rule list across all scopes
caliban perms list --effective

# Show only project-scope rules in JSON
caliban perms list --scope project --json

Output (human-readable): 1 allow Bash:git *

test — check a tool call

Returns exit code 0 (allow), 1 (deny), or 2 (ask) so it's scriptable.

# Would `git push` be allowed?
caliban perms test Bash '{"command":"git push"}'
# MATCH: pattern=Bash:git * action=allow

# Would rm be allowed?
caliban perms test Bash '{"command":"rm -rf /tmp"}'
# MATCH: pattern=Bash:rm * action=deny

explain — show the full match walk

Prints every rule in evaluation order with a MATCH marker next to the first rule that fires. Useful for diagnosing unexpected allow/deny outcomes.

caliban perms explain Bash '{"command":"sudo rm -rf /"}'
# Rule list (source order; first match wins):
#     1       allow   Bash:git *
#     2 MATCH deny    Bash:~rm *
#     3       ask     Bash
#     ...

add — append a rule

Appends a rule to the target scope file (default: project).

# Allow all cargo commands at project scope
caliban perms add "Bash:cargo *" allow --comment "cargo is safe"

# Deny curl at user scope with a reason for the model
caliban perms add "Bash:curl *" deny --scope user --reason "use WebFetch instead"

remove — delete a rule

Remove by exact pattern match. Index-based removal is reserved for a future release.

caliban perms remove --pattern "Bash:cargo *" --scope project

import — import rules from another config

Import rules from a Claude Code settings.json, a legacy caliban JSON file, or a foreign TOML. Defaults to user scope.

# Dry-run first
caliban perms import --from ~/.claude/settings.json --dry-run

# Actually import into user scope
caliban perms import --from ~/.claude/settings.json --scope user

export — export rules to stdout

Outputs the current scope's rules in TOML (default) or JSON format, suitable for redirecting into a new file or piping to another tool.

# Export project rules as TOML
caliban perms export --scope project

# Export as JSON (three-bucket format for interop)
caliban perms export --scope project --format json

audit — inspect the decision log

Reads the JSONL audit log and prints matching entries. See Headless & Audit for log location and rotation details.

# Show all deny decisions in the last hour
caliban perms audit --action deny --since 2026-06-01T00:00:00Z

# Show the 20 most recent decisions for the Write tool
caliban perms audit --tool Write --head 20

lint — check for duplicate rules

Scans a scope's rule list for duplicate (pattern, action) pairs and prints them. Exits 0 if clean, 1 if duplicates are found.

caliban perms lint --scope project
# OK (no duplicate patterns)

caliban perms lint --scope user
# duplicate: pattern="Bash:git *" action=allow

Scopes quick reference

Rules are read from managed → user → project → local (earlier scopes shadow later). The caliban perms add default scope is project; caliban perms import defaults to user. Use --scope to override.

Headless & Audit

Headless mode and the ask verdict

When caliban runs without a TTY — in CI, in a script, or via caliban -p — there is no interactive modal to present. Any tool call that reaches an ask verdict is handled by NonInteractiveAskHandler:

  • Default behavior (no flags): ask becomes a hard deny. The tool call fails with a permission error message that names a concrete remediation.
  • With --auto-allow: every ask verdict becomes allow. This is equivalent to dontAsk mode for the duration of the run.

The deny message is tailored to the tool class:

Tool classSuggested remediation
File-edit (Write, Edit, MultiEdit, NotebookEdit)--permission-mode acceptEdits or a narrow --allow rule
Bash--allow 'Bash(<glob>)' for a targeted rule, or --auto-allow (flagged dangerous)
Other tools--allow '<Tool>' or --auto-allow

Opt-in strategies

Choose the least-permissive option that satisfies the task:

# Allow only file edits (most common CI use case)
caliban -p "update version in Cargo.toml" --permission-mode acceptEdits

# Allow specific git commands
caliban -p "commit and push" --allow "Bash:git *"

# Allow all ask-rule tools (use with care)
caliban -p "run the full refactor" --auto-allow

You can also set rules in the project's permissions.toml so they apply without CLI flags:

[[permissions.rules]]
pattern = "Bash:git *"
action  = "allow"
comment = "safe for CI"

The JSONL audit log

Every tool-call decision (allow, deny, or ask) is appended to an append-only JSONL file.

Log location

PlatformPath
Linux$XDG_STATE_HOME/caliban/permission-decisions.jsonl (default: ~/.local/state/caliban/)
macOS$XDG_DATA_HOME/caliban/permission-decisions.jsonl (default: ~/Library/Application Support/caliban/)

The audit_log setting controls whether logging is active:

[permissions]
audit_log = true   # default; set false to disable

Log format

Each line is a JSON object:

{
  "ts": "2026-06-01T14:23:01.123456Z",
  "session_id": "s_abc123",
  "turn_index": 4,
  "tool_use_id": "tu_xyz",
  "tool_name": "Bash",
  "input_excerpt": "{\"command\":\"git push origin main\"}",
  "action": "allow",
  "matched_rule": {
    "pattern": "Bash:git *",
    "action": "allow"
  }
}

input_excerpt is truncated to 256 characters and newlines are replaced with spaces.

Log rotation

When the log file exceeds 100 MiB, caliban automatically:

  1. Renames the current file to permission-decisions-YYYY-MM-DD.jsonl.
  2. Gzip-compresses the renamed file to permission-decisions-YYYY-MM-DD.jsonl.gz.
  3. Removes the uncompressed renamed file.
  4. Opens a fresh permission-decisions.jsonl for subsequent writes.

Rotated archives accumulate in the same directory. Remove old .gz files manually when disk space is a concern.

Querying the log

Use caliban perms audit to filter and display log entries:

# All decisions since midnight UTC
caliban perms audit --since 2026-06-01T00:00:00Z

# Only denials for the Write tool
caliban perms audit --tool Write --action deny

# Most recent 50 entries
caliban perms audit --head 50

# Combine filters
caliban perms audit --tool Bash --action allow --since 2026-06-01T00:00:00Z --head 100

Exit code is always 0; an empty result prints (empty).

Hardening with permissions.enforce

The permissions.enforce flag prevents the bypass latch from being used, even when --allow-dangerously-skip-permissions is passed:

[permissions]
enforce = true

With enforce = true, caliban refuses to start if --allow-dangerously-skip-permissions is on the command line or if permissions.default_mode is set to bypassPermissions. This is useful for team or managed deployments where operators want to guarantee that static deny rules can never be overridden.

enforce is a deployment-level setting

Set permissions.enforce = true in the managed or user scope, not project scope, so it cannot be overridden by project-level config. A project can always set a lower-priority rule, but only higher-priority scopes can lock out bypass.

Built-in Tools

Caliban ships a fixed set of built-in tools that cover the most common agentic tasks: reading and writing files, executing shell commands, searching code, fetching web content, and coordinating work. Every tool is permission-gated (see Permissions) and subject to the execution policies described in Tool Execution.

Pass --no-tools to disable all tools and run caliban in chat-only mode.

Tool reference

ToolCategoryPurpose
ReadFilesystemRead a UTF-8 file, with optional offset + limit for pagination. Files larger than 5 MB must be read in chunks.
WriteFilesystemWrite content to a file, creating missing parent directories. Overwrites existing content.
EditFilesystemReplace occurrences of old_string with new_string in a file. Expects exactly one match by default; set replace_all=true to replace all.
MultiEditFilesystemApply a sequence of {old_string, new_string} replacements to a single file atomically. If any replacement fails to match, the whole operation is rolled back.
NotebookEditFilesystemAdd, edit, or delete cells in a Jupyter .ipynb notebook (nbformat v4). Preserves cell metadata and outputs; writes atomically via tmpfile + rename.
BashShellRun a shell command and capture stdout + stderr. Supports timeout_seconds, an optional cwd, and a background flag for long-running processes.
BashBgShellCompanion tools for background Bash jobs: read buffered output (BashOutput) or terminate a job (KillShell). Background jobs use a 5 GiB ring buffer.
GlobSearchFind files by name pattern relative to the workspace root.
GrepSearchSearch file contents with a regex, powered by the ripgrep library. Returns up to 100 matches by default (max 500).
WebFetchWebGET a URL and return the body as markdown or plain text. HTML is converted via htmd. 10 MB body cap, 60 s default timeout (configurable up to 300 s).
WebSearchWebQuery a web search API and return ranked results. See backend details below.
TodoWriteAgentReplace the session's shared task list with a new list of {id, content, status} items. The list is re-injected into the system prompt each turn. Max 100 items.
AgentToolAgentSpawn an in-process sub-agent with a task prompt and an optional tool allowlist. Output is capped at 5,000 characters. See Sub-agents.
EnterPlanModePlanSwitch the session into plan mode. While active, only read-only tools may run; destructive tools are blocked until the operator confirms the plan.
ExitPlanModePlanConfirm or abandon the current plan and return to normal execution.
ReadMemoryTopicMemoryRead one auto-memory topic file by slug. See Memory Tiers.
WriteMemoryTopicMemoryWrite or update an auto-memory topic file and update the MEMORY.md index entry atomically. Topic type must be one of user, feedback, project, or reference. See Memory Tiers.

WebSearch backends

WebSearch delegates to one of three search APIs, selected by the CALIBAN_WEBSEARCH_PROVIDER environment variable:

ValueAPI key env varDefault?
braveBRAVE_API_KEYYes
tavilyTAVILY_API_KEYNo
exaEXA_API_KEYNo

If the selected provider's API key is missing, the tool returns a structured error naming the missing variable so the agent can try a different approach rather than failing silently.

Chat-only mode

--no-tools disables all built-in tools (and MCP tools) for the session. This is useful when you want a pure conversation without any side effects — for example, drafting a message or brainstorming before running anything.

Filesystem tool conflict resolution

Edit, Write, MultiEdit, NotebookEdit, and WriteMemoryTopic all declare a conflict key based on their target path (or memory slug). When the model emits two write operations targeting the same file in a single turn, caliban serializes those calls in submission order rather than letting them interleave. Calls targeting different files still execute in parallel. See Tool Execution for details.

Tool Execution

This page covers how caliban resolves file paths for tools, dispatches multiple tools concurrently within a turn, caps oversized tool output, and controls verbosity.

Path resolution and workspace

Every filesystem and shell tool resolves paths relative to the workspace root — the directory caliban was started in, unless overridden.

--workspace <DIR> — Set the workspace root explicitly. Relative tool paths are joined against this directory. If not supplied, caliban uses the current working directory.

--restrict-paths — Reject any tool call whose path resolves outside the workspace root. With this flag, absolute paths that escape the workspace return an error rather than silently accessing arbitrary filesystem locations. Use it when you want a hard containment boundary at the path-resolution layer.

additional_directories — A list of extra root paths declared in settings.toml. Tools may read and write these paths even when --restrict-paths is active, as long as the path falls under one of the declared roots.

# .caliban/settings.toml
additional_directories = [
  "/data/shared",
  "/home/user/docs",
]

Path restriction vs. the OS sandbox

--restrict-paths enforces containment at the Rust level before a process is spawned. The OS Sandbox enforces containment at the OS level inside the subprocess. They are independent and complementary: use both for defense-in-depth.

Parallel tool dispatch

When the model emits multiple tool_use blocks in a single assistant turn, caliban runs them concurrently by default (ADR 0016).

flowchart LR
    A[Model turn\nmulti-tool call] --> B[Permission gate\nserial]
    B -->|Denied| C[Deny result\nto model]
    B -->|Allowed| D[FuturesUnordered\nbounded by semaphore]
    D --> E[Results in\ncompletion order]
    E --> F[Re-ordered to\nsubmission order\nin history]

Permission hooks run serially first. Each before_tool hook fires in submission order, producing an Allowed or Denied decision before any concurrent execution begins. Denied results are returned to the model immediately.

Allowed calls fan out into a FuturesUnordered pool bounded by a semaphore.

Default concurrency limitavailable_parallelism() − 1 (minimum 1). This leaves one CPU core for the agent loop, streaming, and the TUI render thread. Most tools are I/O-bound, so the limit is a soft ceiling against runaway fan-out rather than a strict CPU cap.

Override flags:

Flag / env varEffect
--no-parallel-tools / CALIBAN_NO_PARALLEL_TOOLS=1Run all tools serially (equivalent to a limit of 1).
--parallel-tool-limit N / CALIBAN_PARALLEL_TOOL_LIMIT=NSet the concurrency limit explicitly.

Write conflict serialization. Tools that write to the same target (Edit, Write, MultiEdit, NotebookEdit, WriteMemoryTopic) declare a conflict key. Two calls with the same key are serialized in submission order even when the concurrency limit would permit them to run together. Calls with different keys (or no key) still parallelize freely.

Tool-result capping

Large tool results — for example, reading a multi-thousand-line file — can fill the context window quickly. Caliban caps each tool result before it is appended to the conversation.

tool_result_cap_chars (settings key, default 50000) — Maximum character count for a single tool result delivered inline to the model. Set to 0 to disable capping.

When a result exceeds the cap, the overflow text is written to a spill file under the caliban cache directory (~/.cache/caliban/tool-overflows/ on Linux, ~/Library/Caches/caliban/tool-overflows/ on macOS) and the model receives a truncated result with a note pointing at the spill path. The model can then decide whether to read the spill file directly.

# .caliban/settings.toml — raise the cap for large codebases
tool_result_cap_chars = 100_000

Suppressing tool announcements

By default, caliban prints a line to the terminal each time it invokes a tool, showing the tool name and its primary argument. Pass --quiet to suppress these announcements. Error output from tools is never suppressed.

# Silent execution — no "Running Bash: cargo test" lines
caliban --quiet -p "run the test suite and summarise failures"

The OS Sandbox

Caliban can wrap every subprocess spawned by the Bash tool in an OS-level sandbox that restricts what the child process may do — independent of permission rules. Where permission rules decide whether a command runs, the sandbox controls what it can access once it does.

The sandbox is implemented by the caliban-sandbox crate (ADR 0032). It is disabled by default and must be explicitly enabled in settings.

Platform support

PlatformBackendStatus
macOSApple Seatbelt (sandbox-exec)Supported
Linux / WSLbubblewrap (bwrap >= 0.5)Supported
Windows nativeNot supported in v1; use WSL for the bubblewrap backend

Seatbelt deprecation

Apple has deprecated the sandbox-exec / Seatbelt API. It still ships in all current macOS releases, but caliban's macOS backend will need to move to the Endpoint Security Framework if Apple removes sandbox-exec in a future OS version. There is no announced removal date.

Enabling the sandbox

Add a [sandbox] block to your project or user settings.toml:

[sandbox]
enabled = true
fail_if_unavailable = true   # refuse to start if bwrap/sandbox-exec is missing

With fail_if_unavailable = false (the default), caliban falls back to running unsandboxed if the backend binary is absent or too old, and logs a warning.

What the sandbox restricts

The sandbox limits three classes of access for spawned subprocesses:

Filesystem

KeyEffect
filesystem.allow_readPaths the subprocess may read
filesystem.deny_readPaths hidden from reads (shadows an allow_read entry)
filesystem.allow_writePaths the subprocess may write
filesystem.deny_writePaths write-denied within an allow_write root

On Linux, denied paths are masked with --tmpfs (an empty in-memory directory shadows the real one). On macOS, Seatbelt uses (deny file-write* (subpath …)) rules. Glob patterns are not supported in filesystem ACLs — add explicit path roots.

The environment variables ${WORKSPACE}, ${HOME}, and the XDG vars are expanded when the sandbox is initialized.

Network

Per-hostname egress is not reliably enforceable by either backend alone. The supported patterns are:

  • Block all egress — leave network.allowed_domains empty. Uses --unshare-net on Linux and omits all network-outbound allow rules on macOS.
  • Proxy-filtered egress — set network.http_proxy_port to route subprocess HTTP through an operator-run proxy at 127.0.0.1:<port>. The proxy enforces domain rules; the sandbox only allows the loopback port.

Per-hostname rules on Linux

If you set allowed_domains to a non-empty list on Linux without also configuring http_proxy_port, caliban logs a warning: the Linux bubblewrap backend cannot enforce per-hostname rules without a proxy layer. macOS Seatbelt supports literal (remote tcp "host:port") rules and is correspondingly stricter.

Other network settings

[sandbox.network]
allow_unix_sockets = false     # Docker daemon socket, etc.
allow_local_binding = false    # bind() on local ports
allow_mach_lookup = []         # macOS-only: Mach service names

Full configuration example

[sandbox]
enabled = true
fail_if_unavailable = true
auto_allow_bash_if_sandboxed = true
allow_unsandboxed_commands = ["git", "gh"]
enable_weaker_nested_sandbox = false

[sandbox.filesystem]
allow_read  = ["${WORKSPACE}", "/etc", "/usr"]
deny_read   = ["${HOME}/.ssh"]
allow_write = ["${WORKSPACE}"]
deny_write  = ["${WORKSPACE}/.git/hooks"]

[sandbox.network]
http_proxy_port = 8888
allow_unix_sockets = false
allow_local_binding = false

Key settings

auto_allow_bash_if_sandboxed — When both enabled and this flag are true, the permission classifier auto-allows all Bash(*) calls without showing a prompt. The sandbox is the protection; the Ask modal becomes redundant. Defaults to false. Note: commands listed in allow_unsandboxed_commands are not auto-allowed — they run outside the sandbox and still go through normal permission rules.

allow_unsandboxed_commands — A glob list matched against the first token of each command (or the full command string when the pattern contains a space). Matching commands bypass the sandbox entirely. Use this for tools that genuinely need unrestricted access — for example, git or gh.

enable_weaker_nested_sandbox — For dev containers or VMs that are already inside a user namespace: drops the --unshare-user flag on Linux (which would otherwise fail). This is a no-op on macOS.

bwrap_path / sandbox_exec_path — Override the path to the sandbox binary if it is not at the default location ($PATH for bwrap; /usr/bin/sandbox-exec for macOS).

How it works

SandboxedShim::wrap_command intercepts the tokio::process::Command built by BashTool before it is spawned. If the sandbox is active and the command is not on the bypass list, it rewrites the command so that:

  • On macOS: sandbox-exec -f <profile.sb> <original command>
  • On Linux: bwrap [bind/ro-bind/tmpfs flags] <original command>

The rest of the Bash tool — stdout/stderr capture, PID-group cleanup, timeouts, cancellation — is unchanged. The sandbox is a shim layer, not a fork.

Detection runs at startup. bwrap version >= 0.5 is required on Linux (the --die-with-parent flag arrived in 0.5).

Skills

Skills are reusable instruction packages that the model loads on demand. Each skill is a markdown file with YAML frontmatter — the same format as the Anthropic "superpowers" plugin ecosystem, so existing skills port without changes.

Skills are not executed; they inject text into the model's context. A skill can describe a workflow, a style guide, a debugging procedure, or any other multi-step process. Only the description line is always visible to the model; the full body is fetched lazily when the model calls the Skill tool.

How the Skill tool works

Caliban registers a single built-in tool named Skill. Its description lists every loaded skill by name and one-line description. When the model wants to follow a skill's instructions, it calls Skill with the skill's exact name; the harness returns the body as text and the model proceeds accordingly.

This design keeps the token cost bounded: descriptions are always present, bodies are pay-per-use.

Discovery roots

Caliban scans three roots in priority order. The first match for a given name wins; later roots are shadowed.

PriorityLocationScope
1 (highest)<workspace>/.caliban/skills/Project
2~/.config/caliban/skills/ (XDG-aware)User
3~/.local/share/caliban/plugins/*/skills/Plugin-managed

A project-level skill with the same name as a user-level skill silently replaces it. Malformed SKILL.md files are logged at warn and skipped — loading is best-effort.

Skill file format

Each skill lives in its own subdirectory. The directory name must match the name: frontmatter field exactly.

.caliban/skills/
  my-workflow/
    SKILL.md

SKILL.md structure:

---
name: my-workflow
description: "One-line summary shown to the model in the Skill tool description."
metadata:
  trigger: pre-implementation   # free-form; passed through unchanged
---

# My Workflow

Full markdown instruction set. Only loaded when the model calls Skill({"name": "my-workflow"}).

Required frontmatter fields: name and description. The metadata map is optional.

Built-in skills

Caliban ships one built-in skill compiled into the binary:

NamePurpose
auto-memoryProtocol for reading and writing the auto-memory tiers

Built-ins register before the directory scan, so a user or project skill with the same name will shadow them.

Disabling skills

MethodEffect
--no-skills flagDisables the Skill tool entirely; no skills are loaded
CALIBAN_NO_SKILLS=1Same, via environment variable

Shadowing a built-in

To override the built-in auto-memory skill, place your own auto-memory/SKILL.md in .caliban/skills/. It will take priority over the embedded version without any additional configuration.

  • Plugins — bundle skills alongside hooks, MCP servers, and output styles
  • Slash Command Index/skills overlay shows loaded skills

Custom Slash Commands

Caliban's slash commands are managed through a central SlashCommandRegistry. Every command — whether built-in or plugin-supplied — registers in the same registry, which drives typeahead completion, the /help listing, and dispatch.

The built-in registry (ADR 0040)

At startup, caliban registers approximately 30 built-in slash commands covering session management, context control, configuration, and diagnostics. The registry is the canonical source of truth for what commands exist; /help enumerates the live set.

flowchart LR
    Input["/ input"] --> Typeahead["Typeahead suggester"]
    Input --> Dispatch["Registry dispatch"]
    Dispatch --> Command["SlashCommand impl"]
    Command --> SlashCtx["SlashCtx (session + registries)"]

Each command receives a SlashCtx containing the running session, provider, MCP manager, skills registry, hooks, and settings — everything it might need without requiring each command to thread individual dependencies through its call signature.

Full built-in command list

See the Slash Command Index for the authoritative list with descriptions and arguments.

Key commands relevant to the extending cluster:

CommandPurpose
/skillsShow loaded skills and their descriptions
/mcpShow MCP server status (connected / failed / disabled)
/hooksShow active hook handlers
/pluginsList installed plugins with enable/disable status
/configInteractive settings editor
/output-stylePick an output style

Plugin-supplied commands

Plugins (ADR 0030) may register additional slash commands by placing command markdown files in their commands/ subdirectory. The plugin system feeds these into the registry at startup using the same SlashCommand trait. Plugin-supplied commands are namespaced <plugin>:<command> so they cannot shadow built-ins by accident.

Custom user-defined slash commands are experimental

The ability for end-users to drop custom slash command files into .caliban/commands/ or ~/.config/caliban/commands/ (outside of a plugin) is planned but not yet wired. The ComponentSpec.commands field is reserved in the plugin manifest schema and the registry has the extension point, but standalone user-defined command files are not yet discovered at startup. Track progress against ADR 0040 and the parity matrix row M.

Until this lands, the recommended path for reusable operator-defined procedures is a Skill, which supports the same markdown body format and is already fully discoverable.

Hook on slash submission

UserPromptSubmit fires before the slash parser runs. The hook payload includes is_slash: true, command, and args. A hook can reject or rewrite a slash command — useful for audit logging or per-operator policy enforcement.

Hooks

Hooks let you attach external logic to caliban's event stream — shell scripts, HTTP callbacks, or MCP tools — without modifying the agent or recompiling. Hooks run in-process (for the built-in PermissionsHook and audit hooks) or via an external HookRouter (for operator-configured handlers).

Event taxonomy

Caliban fires events at the following lifecycle points (ADR 0024):

EventWhen it fires
SessionStartOnce at startup, before the first turn
SessionEndOn clean exit
UserPromptSubmitBefore each user message is sent (including slash commands; payload includes is_slash)
PreCompactBefore context compaction begins
PostCompactAfter compaction completes
PreToolUseBefore each tool call; can gate or rewrite the call
PostToolUseAfter each tool call completes
PostToolUseFailureWhen a tool call errors
ConfigChangeWhen a settings file changes on disk (live reload)
CwdChangedWhen the working directory changes
FileChangedWhen a file the agent edited is detected to have changed
SubagentStart / SubagentStopWhen a sub-agent is spawned or exits
TaskCreated / TaskCompletedWhen a sub-agent task is enqueued or finishes
PermissionRequestWhen the agent requests permission for a tool call
PermissionDeniedWhen a tool call is denied
NotificationGeneral notification events
Stop / StopFailureWhen the agent loop stops (cleanly or with error)

Additional events (Setup, UserPromptExpansion, PostToolBatch, InstructionsLoaded, WorktreeCreate, WorktreeRemove, Elicitation, ElicitationResult, TeammateIdle) are reserved but not yet fired.

Handler types

Each hook entry declares one or more handlers. Two handler types are fully wired; three are stubs (see below).

TypeStatusDescription
commandFully wiredSpawn a child process; stdin is event JSON; decision via stdout or exit code
httpFully wiredPOST event JSON to a URL; decision via response JSON
mcpExperimental stubInvoke an MCP server tool with the event JSON
promptExperimental stubCall the model router with a classifier prompt
agentExperimental stubDelegate to a sub-agent (async only)

mcp / prompt / agent handlers are stubs

The mcp, prompt, and agent handler types are defined in the config schema and appear in /hooks output, but their dispatch logic is not yet wired. They will be activated as their upstream dependencies (ADR 0023 MCP wiring, ADR 0037 sub-agent fleet) land. Until then, any handler of these types is silently skipped at dispatch time.

Decision protocol

For PreToolUse and UserPromptSubmit, command and http handlers report their decision as:

Stdout JSON (preferred):

{
  "hookSpecificOutput": {
    "permissionDecision": "allow",
    "permissionDecisionReason": "matched allowlist",
    "updatedInput": {}
  }
}

permissionDecision values: allow, deny, ask. updatedInput lets the hook rewrite the tool input before dispatch (the rewritten input is validated against the tool's schema; validation failure is a hard deny).

Exit codes (shell-script shorthand):

  • 0 — Allow
  • 2 — Deny (stderr becomes the reason)
  • anything else — Allow with a logged warning

PostToolUse and observer-only hooks ignore the decision even when a handler provides one. Handlers marked async = true are fire-and-forget; their decisions are always ignored.

Config: settings hooks table (preferred)

Hooks live in the unified settings file under the hooks key. The table maps event names to arrays of handler groups. See Settings Layering for how scopes merge — hook arrays concatenate across scopes (project entries append to user entries).

# .caliban/settings.toml  — project scope

disable_all_hooks = false
allow_managed_hooks_only = false

allowed_http_hook_urls = [
  "https://hooks.example.com/*",
]
http_hook_allowed_env_vars = ["AUDIT_TOKEN"]

[[hooks.SessionStart]]
matcher = "*"
[[hooks.SessionStart.handlers]]
type    = "command"
command = "/usr/local/bin/caliban-audit"
args    = ["session-start"]
timeout = "5s"

[[hooks.PreToolUse]]
matcher = "Bash"
if      = "Bash:rm *"
[[hooks.PreToolUse.handlers]]
type    = "command"
command = "${CALIBAN_PROJECT_DIR}/.caliban/hooks/guard-rm.sh"
async   = false

[[hooks.PreToolUse]]
matcher = "WebFetch"
[[hooks.PreToolUse.handlers]]
type    = "http"
url     = "https://hooks.example.com/preflight"
headers = { Authorization = "Bearer ${AUDIT_TOKEN}" }
timeout = "3s"

[[hooks.PostToolUse]]
matcher = "*"
[[hooks.PostToolUse.handlers]]
type  = "mcp"
mcp   = "audit-server"
tool  = "log_tool_call"
async = true

Config: legacy hooks.toml (compat)

If no hooks key appears in any settings file, caliban falls back to loading:

  • <workspace>/.caliban/hooks.toml (project scope)
  • ~/.config/caliban/hooks.toml (user scope)

The legacy file uses the same TOML shape shown above (top-level keys plus [[hooks.<Event>]] arrays). The two scopes merge with project entries taking priority. This path is deprecated — prefer the unified settings file for new configurations.

Safety controls

Setting / flagEffect
disable_all_hooks = trueBypasses all external handlers; in-process hooks (permissions, audit) still run
allow_managed_hooks_only = trueOnly handlers from the managed settings scope fire
allowed_http_hook_urlsURL glob allowlist; HTTP handlers fail closed if the URL isn't listed
http_hook_allowed_env_varsEnv vars that may be expanded in HTTP handler headers
--no-hooksOne-off CLI override; mirrors disable_all_hooks for a single run
CALIBAN_NO_HOOKS=1Same, via environment variable

Audit without gating

Mark your audit hooks async = true. Async handlers observe the event but their decision is discarded, so they can never accidentally block a tool call. They run on a bounded task pool (default 16 concurrent) so they don't pile up under heavy load.

MCP Servers

Caliban implements the Model Context Protocol client side, letting you connect any MCP-compatible server as a source of additional tools. Connected servers' tools appear in the same ToolRegistry as built-ins, with the naming convention mcp__<server>__<tool>.

Configuring servers

Servers are declared in the mcp_servers table of your unified settings file, or in the legacy mcp.toml when no unified settings are present.

Minimal stdio server

# .caliban/settings.toml

[mcp_servers.linear]
command = "npx"
args    = ["-y", "@linear/mcp-server"]
env     = { LINEAR_API_KEY = "${LINEAR_API_KEY}" }

HTTP server

[mcp_servers.notion]
type    = "http"
url     = "https://mcp.notion.com/v1"
headers = { Authorization = "Bearer ${NOTION_TOKEN}" }

SSE server

[mcp_servers.legacy-api]
type = "sse"
url  = "https://api.example.com/mcp/sse"

Server configuration reference

FieldApplies toDescription
type / transportall"stdio" (default), "http", "sse"
commandstdioExecutable to spawn
argsstdioCLI arguments
envstdioEnvironment variables; ${VAR} and ${VAR:-default} expanded
cwdstdioWorking directory; relative paths resolve from caliban's cwd
urlhttp, sseAbsolute http:// or https:// URL
headershttp, sseStatic request headers; values support ${VAR} expansion
oauthhttp, sseOAuth mode: "off" (default), "auto", "manual"
disabledalltrue to skip this server entirely
permissionsallPer-server permission scoping (see below)

${CLAUDE_PROJECT_DIR} expands to the current workspace root in all string fields, so plugin-bundled servers can reference binaries relative to the workspace without hardcoding paths.

OAuth (oauth = "auto" and "manual")

For HTTP/SSE servers behind OAuth, caliban performs the authorization-code flow with PKCE and a loopback callback server.

Auto discovery (oauth = "auto"): caliban discovers endpoints from the server's /.well-known/oauth-protected-resource and /.well-known/oauth-authorization-server documents.

Manual configuration (oauth = "manual"): provide a [mcp_servers.<name>.oauth_config] block:

[mcp_servers.my-server]
type  = "http"
url   = "https://api.example.com/mcp"
oauth = "manual"

[mcp_servers.my-server.oauth_config]
client_id  = "${MY_CLIENT_ID}"
auth_url   = "https://auth.example.com/authorize"
token_url  = "https://auth.example.com/token"
scopes     = ["read", "write"]

Tokens are stored in the OS keyring; caliban falls back to $XDG_DATA_HOME/caliban/mcp-tokens.json (mode 0600) on systems without keychain support.

Use --mcp-oauth-port <PORT> (or CALIBAN_MCP_OAUTH_PORT) to fix the loopback callback port on firewalled machines instead of letting caliban pick an ephemeral one.

Per-server permissions

Each server can declare scoped permission rules that compose with the global rule grammar. Patterns match the unprefixed tool name; caliban expands them to mcp__<server>__<tool> when evaluating against the global engine.

[mcp_servers.linear.permissions]
allow = ["read_*", "list_*"]
deny  = ["delete_*"]
ask   = ["create_*", "update_*"]

Merge order when multiple rules match a call: global deny → server deny → server ask → server allow → global ask → global allow → default (Ask)

Discovery and the /mcp overlay

At startup, caliban connects to every non-disabled server, sends initialize, and registers one McpTool per advertised tool. Failures (spawn error, handshake timeout) are logged at warn and skipped — they do not abort startup.

The /mcp slash command shows per-server status:

GlyphMeaning
Connected
Connecting / partial
Disabled or failed

@server:resource references

Type @<server>: in the input bar to trigger resource autocomplete for that server. Caliban calls resources/list lazily on first use and caches the result; resources/list_changed notifications invalidate the cache.

Elicitation

When an MCP server needs additional input from the user (for example, before a destructive operation), it sends an elicitation request. In interactive mode, caliban shows a TUI modal. In --print / CI mode, elicitation requests are automatically declined.

Controls

Flag / envEffect
--no-mcpSkip all MCP server discovery and registration
CALIBAN_NO_MCP=1Same, via environment variable
--mcp-oauth-port <PORT>Fix the loopback OAuth callback port
CALIBAN_MCP_OAUTH_PORT=<PORT>Same, via environment variable

Config file location

The preferred location for MCP server config is the mcp_servers table in .caliban/settings.toml (project scope) or ~/.config/caliban/settings.toml (user scope). The legacy mcp.toml is still supported as a fallback when no unified settings file is present — project overrides user at the same server name, wholesale.

Plugins

A plugin bundles related customizations — skills, hooks, sub-agent definitions, MCP server configs, and output styles — into a single installable directory. The plugin system (ADR 0030) is a thin orchestrator: it parses a plugin.json manifest, namespaces items, expands ${CALIBAN_PLUGIN_ROOT}, and feeds everything into the same per-surface loaders that project and user files use.

What a plugin contains

my-plugin/
  plugin.json             # required manifest
  skills/
    my-workflow/
      SKILL.md
  hooks/
    hooks.json
  agents/
    reviewer.md
  output-styles/
    concise.md
  mcp/
    .mcp.json
  commands/
    recap.md

All subdirectories are optional. When a components entry is omitted from the manifest, the loader scans the conventional subdirectory automatically.

The manifest (plugin.json)

{
  "name": "my-plugin",
  "version": "1.0.0",
  "description": "Short description shown in /plugins and trust prompts",
  "author": "Alice <alice@example.com>",
  "license": "MIT",
  "homepage": "https://example.com/my-plugin",
  "components": {
    "skills": ["skills/my-workflow"],
    "hooks": "hooks/hooks.json",
    "agents": ["agents/reviewer.md"],
    "output_styles": "output-styles/concise.md",
    "mcp_servers": "mcp/.mcp.json",
    "commands": ["commands/recap.md"]
  },
  "caliban": {
    "min_version": "0.5.0",
    "platforms": ["macos", "linux"]
  }
}
FieldRequiredDescription
nameYesMatches the directory name. Must be [a-z0-9_-]{1,32}.
versionYesSemver string.
descriptionNoOne-line description.
authorNoFree-form author string.
licenseNoSPDX identifier.
homepageNoURL.
componentsNoPaths to bundled files (string or array).
caliban.min_versionNoSkip when the running caliban is older.
caliban.platformsNoLimit to macos, linux, or windows.

For MCP servers bundled as inline config (matching Claude Code's .mcp.json shape), use the top-level mcpServers key instead of components.mcp_servers.

Discovery roots

Caliban scans three roots at startup. A plugin with the same name in an earlier root replaces later ones — no manifest merging.

PriorityRootScope
1 (highest)<workspace>/.caliban/plugins/<name>/Project
2$XDG_DATA_HOME/caliban/plugins/<name>/ (user install dir)User
3/etc/caliban/plugins/<name>/ (platform analogues)Managed (org policy)

Managed plugins ignore the plugins.enabled list — they run regardless of per-user configuration.

Namespacing

Items loaded from a plugin carry a <plugin>:<item> prefix:

  • Skills: my-plugin:my-workflow
  • Output styles: my-plugin:concise
  • MCP servers: my-plugin:my-server

This prevents collisions with bare-named items at the project or user level. Hooks merge additively across plugins.

The caliban plugin command

# List all installed plugins and their status
caliban plugin list

# Show the manifest of an installed plugin as JSON
caliban plugin info <name>

# Install a plugin from a marketplace
caliban plugin install <name>@<marketplace-url>

# Install a plugin from a local directory
caliban plugin install --dir /path/to/my-plugin

# Update a plugin to the latest marketplace version
caliban plugin update <name>

# Remove a plugin
caliban plugin remove <name>

# Enable / disable a plugin (affects whether it loads at startup)
caliban plugin enable <name>
caliban plugin disable <name>

caliban plugin help prints the full reference.

Marketplace trust

First-time marketplace installs display the manifest, its sha256 hash, and the install URL and prompt for acknowledgement. Acknowledged installs are recorded in $XDG_DATA_HOME/caliban/trust/plugins.json. Re-installs of the same manifest hash skip the prompt; version bumps re-prompt. Sideloads (local --dir installs) skip trust gating because the operator already has filesystem access.

${CALIBAN_PLUGIN_ROOT} expansion

Inside plugin-bundled hook commands and MCP server configs, ${CALIBAN_PLUGIN_ROOT} expands to the plugin's absolute root directory. ${CLAUDE_PLUGIN_ROOT} is a supported alias so existing Claude Code plugins port verbatim.

Note

The --no-plugins flag (or CALIBAN_NO_PLUGINS=1) disables plugin discovery entirely for a single run, treating all plugin roots as empty. This is useful for debugging or for CI environments that should not pick up locally installed plugins.

Output Styles

Output styles nudge the model toward a particular response shape — more explanatory prose, learning-paced prompts with TODO(human) markers, or a proactive fill-in approach — by splicing a block into the system prompt. They are orthogonal to tools, hooks, and permissions: switching styles changes only the system prompt.

Built-in styles

Caliban ships four built-in styles compiled into the binary:

NameDescription
defaultNo-op — identical to having no style configured (zero prompt-cache impact)
proactiveEncourages the model to fill in gaps and make decisions rather than pausing to ask
explanatoryRequests detailed commentary explaining each decision and code change
learningInstructs the model to emit TODO(human): <prompt> markers on non-trivial decisions; the TUI highlights them

The default style emits no block at all, so switching to it produces the exact same system prompt as having no style — prompt-cache hits are preserved.

Selecting the active style

Via settings (preferred): set output_style in your settings file.

# ~/.config/caliban/settings.toml
output_style = "explanatory"

Via environment variable (until the settings hierarchy is fully wired):

CALIBAN_OUTPUT_STYLE=learning caliban

Via the TUI: use /output-style to open the picker. The new selection is remembered for the session but takes effect only after /clear or a restart, because providers cache the system prompt and a mid-session change would silently invalidate that cache.

Style activation requires /clear or restart

System prompts are cached by every major provider. Selecting a new style mid-session does not change what the provider sees until the next session begins. The /config output-style overlay surfaces a "applies after /clear or restart" hint.

How styles splice into the system prompt

OutputStylePrefix::splice_into wraps the active style's body in an <output-style> XML element and prepends it to the base system prompt. Memory tier content goes first, then the style block, then the base body:

[memory tiers]

<output-style name="explanatory">
... style body ...
</output-style>

[base system prompt]

If the active style has an empty body (the default style), splice_into returns the base prompt unchanged — no extra tokens, no cache miss.

The frontmatter field keep_coding_instructions: false (default true) lets a style suppress the default coding-assistant guidance block. Use this for documentation-only or writing-only modes where coding instructions are irrelevant.

Custom styles

Drop a .md file with YAML frontmatter into the appropriate directory. The file stem must match the name: field.

.caliban/output-styles/
  brief.md

Example brief.md:

---
name: brief
description: "Terse responses — one sentence per point, no preamble."
keep_coding_instructions: true
---

Keep all responses as brief as possible. One sentence per point.
No greetings, no summaries, no padding. Respond with the minimum necessary.

Required fields: name and description. Both snake_case (keep_coding_instructions) and kebab-case (keep-coding-instructions) are accepted.

Discovery roots

PriorityLocationScope
1 (highest)<workspace>/.caliban/output-styles/<name>.mdProject
2$XDG_CONFIG_HOME/caliban/output-styles/<name>.mdUser
3$XDG_DATA_HOME/caliban/plugins/<plugin>/output-styles/<name>.mdPlugin (namespaced <plugin>:<name>)
4 (lowest)Built-ins (compiled in)Built-in

A project style with the same name shadows user, plugin, and built-in styles.

Plugin-supplied styles and force_for_plugin

A plugin-supplied style with force_for_plugin: true in its frontmatter overrides the operator's output_style setting while the plugin is enabled. The /config picker shows a "locked by plugin: X" badge. Disabling the plugin releases the lock.

force_for_plugin: true is silently ignored on non-plugin styles (project, user, built-in).

Sub-agents

Caliban can spawn a nested agent — a sub-agent — to handle a focused subtask without polluting the parent's transcript. The parent's turn loop pauses while the sub-agent runs, then resumes with the sub-agent's condensed result as a single tool-result block.

The AgentTool

Sub-agents are exposed to the model as a built-in tool named AgentTool. When the model invokes it, caliban spins up a fresh Agent instance in the same process and drives it to completion.

Key properties of an AgentTool invocation:

PropertyValue
Process boundaryNone — in-process, same tokio runtime
Max turns20 (hard limit)
Output returned to parentFinal assistant text, truncated to 5 000 chars
Intermediate turnsNot recorded in the parent session; visible in debug logs
CancellationInherits the parent's cancellation token
Provider / modelInherits parent's provider; model input overrides the model
HooksInherited by default; opt out with inherit_hooks: false

Tool allowlist

The tool_allowlist input controls which tools the sub-agent may call:

  • Omitted or null — inherits every tool the parent has, except AgentTool itself.
  • Explicit list — sub-agent gets exactly those tools. Unknown names are silently dropped.

No recursion

AgentTool is always stripped from the sub-agent's registry. Sub-agents cannot spawn further sub-agents. Nested fan-out is planned for a future release.

Isolation mode

Each AgentTool invocation carries an isolation field (none or worktree):

  • none (default) — sub-agent shares the parent's working directory. Suitable for read-only work (investigation, summarization).
  • worktree — sub-agent runs in a dedicated git worktree materialized at .caliban/worktrees/<name>. Suitable for tasks that write files. See Worktree Isolation for details.

Background mode

Setting background: true in the AgentTool input detaches the sub-agent from the parent and hands it off to the caliband supervisor daemon. The parent's call returns immediately with the new agent's id. See The Background Fleet.

Hook inheritance and background mode

Closure-based hooks cannot cross the process boundary. When background: true is set and the parent has closure hooks installed, caliban drops those hooks with a warning and continues. Only config-expressible hooks survive the handoff. Pass inherit_hooks: false to suppress the warning if you know the sub-agent does not need the parent's hooks.

The --no-sub-agent flag

Pass --no-sub-agent (or set CALIBAN_NO_SUB_AGENT=1) to remove AgentTool from the tool registry entirely. The model will never see the tool and cannot spawn sub-agents.

caliban --no-sub-agent "review this codebase"

This is useful when you want a strict single-agent session, or when operating in an environment where spawning child work is undesirable (CI cost budgets, audit requirements).

When to use sub-agents

Use caseRecommended approach
Read-only research (grep, read, glob) without context bloatAgentTool with tool_allowlist: ["Read","Grep","Glob"]
File-writing subtask that must not mix diffsAgentTool with isolation: worktree
Long-running task that should survive the parent sessionAgentTool with background: true, or --bg <task>
Strict single-agent run--no-sub-agent

For the full set of built-in tools the sub-agent can draw on, see Built-in Tools.

The Background Fleet

Caliban can run sub-agents in the background — detached from your current session — and let you monitor, attach to, or stop them at will. A per-repo supervisor daemon (caliband) owns the fleet and keeps agents alive even after the parent caliban process exits.

Spawning a background agent

From the command line

The quickest way to fire off a background task is the --bg flag:

caliban --bg "refactor the auth module to use the new token type"

This is shorthand for caliban agents spawn --prompt <task>. Caliban auto-starts caliband if it is not already running, then returns immediately with the new agent's id.

From inside a session

The model can request a background sub-agent by setting background: true in an AgentTool call. The parent session receives the id and a note to check back via caliban attach <id>.

The caliband daemon

caliband is a separate binary shipped alongside caliban. It runs as a per-repo daemon, meaning each git repository gets its own daemon instance.

Socket path (resolution order):

  1. $CALIBAN_DAEMON_RUNTIME_DIR/<hash>.sock if CALIBAN_DAEMON_RUNTIME_DIR is set.
  2. $XDG_RUNTIME_DIR/caliban/<hash>.sock if $XDG_RUNTIME_DIR is set.
  3. $TMPDIR/caliban-daemon/<hash>.sock (fallback; typical on macOS).

The <hash> is a 16-hex-char SHA-256 prefix of the absolute repo root path, so each repo gets a stable, unique socket without naming collisions.

caliband auto-starts when any caliban agents command or --bg flag needs it. You should rarely need to launch it directly.

Installing caliband

cargo install caliban installs only the caliban binary. To also install the daemon run:

cargo install caliban-supervisor --bin caliband

Both binaries must be on your $PATH for background fleet features to work.

Agent lifecycle states

StateMeaning
spawningRegistered, not yet executing
runningActively processing turns
idleWaiting for input; no compute pending
killedStopped via kill
doneFinished successfully
failedFinished with an error
crashedDaemon restarted while agent was active; needs recovery

caliban agents subcommands

caliban agents list

Print all registered agents and their status.

caliban agents list

caliban agents spawn

Spawn a new background agent with an explicit prompt.

caliban agents spawn --prompt "audit all SQL queries for injection risks"
caliban agents spawn --prompt "write tests for crates/caliban-tools-builtin" --label my-test-agent

Options:

FlagDescription
--prompt <TEXT>Initial prompt for the agent (required)
--label <NAME>Human-readable label shown in list and logs

caliban agents attach <id>

Stream a running agent's transcript live. Press Ctrl+D to detach without stopping the agent.

caliban agents attach a3f8b2c1

caliban agents logs <id>

Print the agent's session log (session.json).

caliban agents logs a3f8b2c1

caliban agents kill <id>

Terminate an agent (SIGTERM, escalating to SIGKILL after a grace period).

caliban agents kill a3f8b2c1

caliban agents respawn <id>

Kill the agent and restart it with the same original spawn spec (same prompt, model, isolation settings).

caliban agents respawn a3f8b2c1

Note that respawn assigns a new id; the old id is removed from the registry.

caliban agents rm <id>

Remove an agent from the registry. The agent must be stopped first, unless --force is passed.

caliban agents rm a3f8b2c1
caliban agents rm a3f8b2c1 --force   # remove even if still running

Top-level shorthands

Four common operations have top-level sugar to save typing:

ShorthandEquivalent
caliban attach <id>caliban agents attach <id>
caliban logs <id>caliban agents logs <id>
caliban stop <id>caliban agents kill <id>
caliban kill <id>caliban agents kill <id>
caliban respawn <id>caliban agents respawn <id>
caliban rm <id>caliban agents rm <id>

caliban daemon subcommands

caliban daemon status

Print daemon health, PID, uptime, agent count, and the socket path.

caliban daemon status

caliban daemon stop

Ask the daemon to shut down gracefully after finishing in-flight requests. Running agents are not automatically killed; stop them first if you want a clean shutdown.

caliban daemon stop

Session storage

Each background agent's transcript is stored as a regular caliban session file at <base>/agents/<id>/session.json. This means all session tooling (compaction, replay, audit) works on background agents out of the box. Attaching to an agent is conceptually the same as resuming its session over the agent's per-agent socket.

Diagram: agent lifecycle

flowchart LR
    A([caliban --bg task]) -->|spawn request| D[caliband daemon]
    D -->|registers| R[(Registry)]
    D -->|starts| W[Agent worker]
    W -->|streams turns| S[(session.json)]
    W -->|per-agent socket| T([caliban attach id])
    W -->|done/failed| R
    T2([caliban agents kill id]) -->|kill request| D
    D -->|SIGTERM→SIGKILL| W

For how background agents use git worktree isolation, see Worktree Isolation.

Worktree Isolation

When a sub-agent writes files, those writes land in the parent's working tree by default. That is fine for read-only investigation, but it mixes the sub-agent's diff into yours and gives you no clean way to discard it. Worktree isolation solves this: caliban materializes a dedicated git worktree for the sub-agent so its file operations are completely separate from the parent's tree.

How it works

When isolation: worktree is requested, caliban uses the caliban-worktrees crate to:

  1. Create a new git branch named caliban/<name> off the chosen base ref.
  2. Materialize a worktree at .caliban/worktrees/<name>/ in the repo root.
  3. Optionally apply sparse-checkout patterns to limit which paths are checked out.
  4. Optionally symlink heavy directories (e.g. target/, node_modules/) from the parent repo into the worktree so they are shared rather than duplicated.
  5. Run the sub-agent with its working directory set to the worktree root.

The sub-agent's git history (commits, diffs) lives on the caliban/<name> branch. You can inspect, cherry-pick, or discard it with standard git commands after the run.

Base ref options

The worktree.base_ref field controls what the new branch is rooted on:

ValueEffect
head (default)Branch off the current HEAD commit
freshBranch off HEAD, but start with a near-empty sparse checkout (only a sentinel pattern is checked out)
Any rev-parse-able stringBranch off that specific commit, tag, or branch name

Sparse checkout

Set worktree.sparse_paths to a list of path patterns to limit which files are materialized in the worktree. Patterns follow git's sparse-checkout cone format. An empty list (the default) checks out all files.

{
  "prompt": "refactor crates/caliban-tools-builtin",
  "isolation": "worktree",
  "worktree": {
    "base_ref": "head",
    "sparse_paths": ["crates/caliban-tools-builtin/", "Cargo.toml"]
  }
}

Symlinked directories

Large directories that should be shared — not copied — go in worktree.symlink_directories. Each path is relative to the parent repo root. The directory must exist in the parent at creation time.

{
  "prompt": "run the test suite and summarize failures",
  "isolation": "worktree",
  "worktree": {
    "symlink_directories": ["target", "node_modules"]
  }
}

Cleanup behavior

ContextWhen the worktree is removed
Foreground sub-agentWhen the sub-agent's task completes (the handle drops)
Background sub-agentWhen caliban agents rm <id> is run
Daemon restart with orphansOn next daemon startup (configurable)

Set CALIBAN_KEEP_WORKTREES=1 to disable automatic removal for debugging. The worktree (and its caliban/<name> branch) will then persist until you remove it manually with git worktree remove and git branch -d.

Operator notes

  • Disk usage. Each worktree is a full checkout of the matched paths. Use sparse_paths and symlink_directories to keep sizes manageable. The default head base ref shares git objects with the parent repo, so only working-tree files consume extra disk.
  • One worktree per sub-agent. Two concurrent sub-agents with the same name will conflict. Background fleet agents receive auto-generated names based on their id, so fleet-level collisions are not a concern. For foreground parallel agents (a future feature), use distinct names.
  • Branch visibility. git branch --list 'caliban/*' shows all active sub-agent branches. You can merge, rebase, or delete them like any other branch.

For how worktree isolation relates to background agents and the caliband daemon, see The Background Fleet.

Memory Tiers

Caliban carries three on-disk memory tiers that are spliced into every system prompt before the session starts. All three are plain Markdown files you can read and edit with any text editor. A fourth tier — MCP-mediated long-form memory — is planned for a future release.

flowchart LR
    G["Global CLAUDE.md\n~/.config/caliban/CLAUDE.md"]
    P["Project tier\n&lt;workspace&gt;/CLAUDE.md\n(+ ancestor walk + @-imports + rules)"]
    A["Auto-memory index\n~/.local/share/caliban/projects/&lt;slug&gt;/memory/MEMORY.md"]
    SP["System prompt"]
    G --> SP
    P --> SP
    A --> SP

The splice order is always global → project → auto-memory, each tier wrapped in an XML-tagged block so the model can distinguish them:

<global-claude-md path="…/CLAUDE.md">
…
</global-claude-md>

<project-claude-md path="…/CLAUDE.md">
…
</project-claude-md>

<auto-memory-index path="…/MEMORY.md">
…
</auto-memory-index>

<default system prompt…>

Missing tiers are silently omitted — no empty tag block is emitted.

Tier 1 — Global

Path: ~/.config/caliban/CLAUDE.md (XDG $XDG_CONFIG_HOME honored)

Owned by the operator. Caliban never writes here. Use it for cross-project preferences: tool choices, tone, coding style, personas. Read once at startup; missing file is fine.

Tier 2 — Project

Path: <workspace_root>/CLAUDE.md — plus the ancestor walk described in CLAUDE.md & Imports.

Owned by the project / repository — commit it like any other file. Contains repo-specific conventions, build commands, and taboos. Caliban never writes here.

Tier 3 — Auto-memory

Directory: ~/.local/share/caliban/projects/<sanitized-cwd>/memory/ (XDG $XDG_DATA_HOME honored; override with CALIBAN_AUTO_MEMORY_DIRECTORY or CALIBAN_MEMORY_DIR).

Owned by the agent. The agent uses ReadMemoryTopic and WriteMemoryTopic — two built-in tools — to maintain a per-project knowledge base across sessions. See Auto-Memory for the full format and write protocol.

Only MEMORY.md (the index, capped at 200 lines / 25 KB) is loaded eagerly each session. Topic files are read on demand.

Token budget

The combined memory prefix defaults to 32 000 tokens (estimated as bytes / 4, provider-agnostic). If the combined size exceeds the cap, the auto-memory tier is truncated first (a [truncated: N bytes] notice is appended to its block), then the project tier, then the global tier.

Per-tier caps can be set in the [memory] block of settings.toml:

[memory]
cap_tokens_auto      = 8000   # cap the auto tier independently
cap_tokens_claude_md = 16000  # cap the combined CLAUDE.md tier
cap_tokens_combined  = 28000  # override the combined ceiling

The same values can be set via environment variables: CALIBAN_MEMORY_BUDGET_TOKENS, CALIBAN_MEMORY_CAP_TOKENS_AUTO, and CALIBAN_MEMORY_CAP_TOKENS_CLAUDE_MD.

When the sum of both per-tier caps would exceed the combined ceiling, each is scaled down proportionally so the sum fits.

The Memory tool and /memory

The built-in Memory tool is the agent-facing interface for reading and writing the auto-memory tier. See Built-in Tools for the full tool reference.

The /memory slash command shows the active tiers, their paths, and their estimated token counts:

/memory
  global   ~/.config/caliban/CLAUDE.md (412 tokens)
  project  /Users/me/dev/myproject/CLAUDE.md (880 tokens)
  auto     ~/.local/share/caliban/projects/…/memory/MEMORY.md (256 tokens)
    walk     /Users/me/dev/myproject/CLAUDE.md (880 tokens)

Disable auto-memory for CI

Set CALIBAN_DISABLE_AUTO_MEMORY=1 to drop the auto-memory tier entirely and prevent the auto-memory skill from loading. This guarantees identical system prompts across headless and CI runs regardless of on-disk memory state. --bare sets the same flag automatically.

CLAUDE.md & Imports

The project memory tier is richer than a single file. At session start, caliban walks up the directory tree from the current working directory, concatenating every CLAUDE.md, AGENTS.md, and .caliban.md it finds, then resolves any @-imports inside them and activates path-scoped rules from .caliban/rules/.

Ancestor walk

Starting at cwd, caliban walks toward the filesystem root. The walk stops at the first git root it finds, or the filesystem root, whichever comes first (WalkStop::Both, the default).

Within each directory, files are loaded in most-specific → most-general order: .caliban.mdCLAUDE.mdAGENTS.md. All three are concatenated; they don't override each other.

The resulting files are spliced in broad → narrow order (root-first) so that narrower, more-specific instructions appear later and take precedence in the model's reading.

Regression escape

If the ancestor walk misbehaves in a repo you don't have CI coverage for, set CALIBAN_DISABLE_CLAUDE_MD_WALK=1 to revert to the legacy single-file project tier (<workspace_root>/CLAUDE.md only).

@-imports

Any of the discovered files may contain @-import directives on their own line:

@./shared/conventions.md
@~/notes/api-style.md
@/abs/path/to/team-guide.md

Import resolution is:

  • Depth-bounded to 5 levels of recursion.
  • Cycle-detected by canonical path — circular imports are ignored.
  • Local paths only. HTTP/HTTPS URLs (@https://…) are rejected outright to keep the prompt-assembly path auditable.
  • External imports (paths outside the workspace root and outside ~/.config/caliban/) require one-time approval. The approval decision is persisted to ~/.caliban/imports-allowlist.json. In non-interactive mode (--print, --bare, CI), external imports are denied unless CALIBAN_APPROVE_IMPORTS=1 is set.

Imported content is inlined at the import site with an <!-- imported from … --> marker so the model can trace provenance.

Nested on-demand

When the model reads or edits a file in a subdirectory that has its own CLAUDE.md, that file is appended to the system prompt for the rest of the session. This happens once per (path, session) pair — caliban does not reload on file changes or unload when the model leaves the subtree.

The system prompt grows monotonically during a session. This is intentional: operators reason about it as "everything the model has been told", not as a sliding window.

Path-scoped rules

Files under .caliban/rules/<topic>.md are loaded with optional paths: glob frontmatter:

---
paths:
  - "src/**/*.ts"
  - "tests/**/*.ts"
---

Always use `strict` TypeScript. Prefer `unknown` over `any`.

Rules without a paths: frontmatter are always-active and loaded at startup. Rules with paths: frontmatter are activated lazily on the first file touch matching any pattern in the set. Once activated, a rule stays in the prompt for the rest of the session.

claude_md_excludes for monorepos

Large monorepos often have directories whose CLAUDE.md should not be spliced into every session. Add gitignore-style patterns to settings.toml to skip them during the ancestor walk:

claude_md_excludes = [
  "node_modules/**",
  "vendor/**",
  "third_party/**/CLAUDE.md",
]

Patterns are evaluated relative to the workspace root (the cwd at startup), not the absolute filesystem path. Last-match wins for a given path; ! negation is supported.

The same patterns can be supplied at runtime via the colon- or newline-separated CALIBAN_CLAUDE_MD_EXCLUDES environment variable.

Additional directories

--additional-directories (or additional_directories in settings.toml) extends the set of paths the file tools can access. These directories do not contribute CLAUDE.md content by default. Set CALIBAN_ADDITIONAL_DIRECTORIES_CLAUDE_MD=1 to opt in — each added path then performs its own ancestor walk, concatenated after the cwd walk in declaration order.

Note

Tier content is spliced via the <project-claude-md> and <project-rule> XML tags in the system prompt. Use /memory to inspect which files were loaded and their token counts.

Auto-Memory

Auto-memory is the agent-writable third tier of caliban's memory model. At the start of each session, caliban splices a per-project index file (MEMORY.md) into the system prompt. During the session, the agent uses two built-in tools — ReadMemoryTopic and WriteMemoryTopic — to read and write Markdown topic files that persist knowledge across sessions.

Directory layout

~/.local/share/caliban/projects/<sanitized-cwd>/memory/
  MEMORY.md              ← index, spliced into every session (≤ 200 lines / 25 KB)
  build-commands.md      ← topic file
  api-conventions.md     ← topic file
  deploy-checklist.md    ← topic file
  …

The <sanitized-cwd> slug is derived from the canonical workspace path (e.g. /Users/jf/dev/calibanUsers-jf-dev-caliban). Override the directory with CALIBAN_AUTO_MEMORY_DIRECTORY or CALIBAN_MEMORY_DIR.

The index file (MEMORY.md)

MEMORY.md is the only file loaded eagerly each session. It must stay under 200 lines / 25 KB so it fits comfortably inside the splice budget. Caliban bootstraps an empty MEMORY.md with a conventions block the first time the memory directory is accessed.

A typical index looks like:

# Memory index

- [build-commands](build-commands.md) — project: `cargo build --release`; binary lands in `target/release/`
- [api-conventions](api-conventions.md) — feedback: prefer the built-in HTTP helper over shelling out to curl
- [deploy-checklist](deploy-checklist.md) — project: run migrations before flipping the feature flag

HTML comments (<!-- … -->) in MEMORY.md are stripped from the spliced prompt but kept on disk. The auto-injected conventions block is wrapped in HTML comments for this reason — it stays on disk for authoring guidance but does not consume token budget.

Topic file format

Each topic file is a Markdown file with YAML frontmatter:

---
name: sprint-mode
description: "user prefers consolidated design proposals + spec + plan + implementation in one pass"
metadata:
  node_type: memory
  type: feedback
---

User prefers a single-pass workflow: design proposal, spec, plan, and
implementation delivered together without a human review checkpoint in
between.
Frontmatter fieldRequiredDescription
nameyesKebab-case slug matching the filename stem
descriptionyesOne-line summary (≤ 120 chars); appears in the index
metadata.typeyesOne of user, feedback, project, reference
metadata.node_typenoAlways memory when written by the agent

Slug rules: non-empty, no path separators, no .., no leading ..

Memory types

TypeUse for
userDurable facts about the user (role, timezone, preferences)
feedbackCorrections or workflow preferences issued by the user
projectDurable project facts not already captured in the repo
referenceStable external context (account IDs, API endpoints, quotas)

The agent classifies each topic at write time. There is no automated classifier — the model is best positioned to judge what to save.

Built-in tools

ToolPermission categoryDescription
ReadMemoryTopicmemory.* (allow)Read a topic file by slug
WriteMemoryTopicmemory.* (allow)Write/update a topic file and update the index atomically

Both tools are sandboxed to the memory directory — path traversal attempts are rejected at the tool level.

WriteMemoryTopic performs an atomic write:

  1. Write topic body + frontmatter to <slug>.md.tmp.
  2. Rename to <slug>.md (atomic on the same filesystem).
  3. Rewrite MEMORY.md with an updated index line for the slug (same tmp-then-rename approach).

A crash between steps 2 and 3 leaves an orphan topic file. Run /memory rebuild-index to repair it.

Managing memory

CommandEffect
/memoryShow active tiers, paths, and token counts
/memory rm <slug>Delete a topic file and remove its index line
/memory rebuild-indexRebuild MEMORY.md from the topic files on disk

There is no automatic pruning. Memories persist until manually removed.

MEMORY.md growth

The index grows without bound on long-running projects. Periodically review it with /memory and remove stale topics with /memory rm <slug> to keep it under the 200-line / 25 KB splice limit.

Cross-references between topics

Topic bodies may contain [[slug]] cross-references, for example [[parity-gap-matrix]]. These are informational breadcrumbs — caliban does not auto-follow them. The agent can follow a reference by calling ReadMemoryTopic with the referenced slug.

Disable for CI

Set CALIBAN_DISABLE_AUTO_MEMORY=1 to drop the auto-memory tier from the splice and suppress the auto-memory skill. This guarantees identical system prompts regardless of on-disk memory state. --bare sets the same flag automatically.

Checkpoints & Rewind

Caliban takes a per-prompt snapshot of every file that a file-writing tool touched during that prompt's turns. If you don't like the result, /rewind lets you pick any prior prompt and restore the files, the conversation, or both — without losing the history of what happened in between.

What gets snapshotted

The checkpoint recorder fires on Write, Edit, MultiEdit, and NotebookEdit. Before any of these tools mutates a file for the first time within a prompt, caliban reads the pre-image and stores it content-addressed under the per-prompt blob directory.

Bash mutations are not tracked

Commands run via Bash (including rm, mv, cp, and arbitrary subprocess writes) are not captured in the checkpoint. The /rewind overlay surfaces this in its footer. Bash-created files that a Write/Edit later touches are recorded from that point forward.

Plan-mode prompts (which reject mutating tools) emit an empty manifest so they are still selectable as conversation-rewind targets.

Disk layout

~/.caliban/projects/<cwd-hash>/checkpoints/<session>/
  prompt-001/
    manifest.json
    blobs/<sha256>.bin
  prompt-002/
    manifest.json
    blobs/<sha256>.bin
  …

<cwd-hash> is the first 16 hex characters of sha256(canonical_cwd). Override the root with CALIBAN_CHECKPOINT_ROOT. Disable recording entirely with CALIBAN_CHECKPOINT_DISABLED=1.

Each manifest.json records:

FieldDescription
prompt_indexMonotonic prompt counter within the session (1-based)
kindfiles (normal), plan (plan-mode, no blobs), cleared (pruned)
titleFirst ~80 chars of the user message
created_atUTC timestamp
entriesArray of file entries (path, sha256, mode, size, exists_pre, tool)
partialtrue if some blob writes failed

For each entry, exists_pre: false means the file was created by the prompt (restore will delete it). Blobs are content-addressed — the same pre-image across two prompts is stored once.

Triggering /rewind

Open the rewind overlay from the TUI in two ways:

  • Type /rewind at the prompt.
  • Press Esc Esc (two Esc presses within 400 ms) when the input buffer is empty.

The overlay lists prompts newest-first. Navigate with arrow keys, confirm with Enter.

Restore options

OptionDefaultEffect
Restore bothEnterOverwrite tracked files and truncate conversation
Restore code onlyOverwrite tracked files; leave conversation intact
Restore conversation onlyTruncate messages; leave files intact
Summarize from hereRun the compactor on the messages after the checkpoint
Summarize up to hereRun the compactor on the messages up to the checkpoint

"Truncate conversation" removes all messages after the selected prompt's last assistant message, so the conversation ends at that point in time.

Tip

The two summarize options feed the same SummarizingCompactor used by /compact. They're useful when you want to keep the context clean after rolling back — for example, summarize everything before the rewind point so the model retains the overall arc without the failed detour.

Storage limits and pruning

CALIBAN_CHECKPOINT_MAX_BYTES caps total blob storage per project (default 5 GiB). When the cap is exceeded, oldest prompt blobs are dropped first; the manifest is kept as a cleared marker so the prompt remains selectable for conversation rewind (but file restore is no longer possible).

A checkpoint directory is removed only when cleanupPeriodDays (default 30) has elapsed since its last update and the corresponding session is being pruned by the session store. Checkpoints are never orphaned while a session is still resumable.

Context & Compaction

Every provider has a finite context window. Caliban tracks utilization in real time and provides several tools — automatic and manual — to keep long sessions healthy without losing important history.

Context tracking

Caliban maintains a ContextWindow counter that accumulates token usage from every provider response. This is independent of the telemetry subsystem: the /context command and the TUI status-bar percentage work for all users regardless of whether CALIBAN_ENABLE_TELEMETRY is set.

/context
  input tokens used : 62 430 / 200 000  (31%)
  output tokens used: 4 812
  ⚠ approaching limit (warn threshold: 80%)

/context shows a per-message-kind breakdown and warns when utilization reaches 80%. See Telemetry & Cost for OTLP export of context metrics.

Auto-compaction

When the context-window utilization reaches auto_compact_threshold, caliban automatically runs the configured compactor before the next turn. The default threshold is 0.75 (75% utilization).

Configure in settings.toml:

auto_compact_threshold = 0.75   # 0.0–1.0; unset or null disables autocompact

Set auto_compact_threshold to null (or omit it) to disable autocompact entirely and rely on manual /compact invocations.

Micro-compaction

Micro-compaction is an LLM-free per-turn pass that supersedes stale ToolResult blocks in the conversation history without making any API calls.

The logic is per-tool:

ToolSupersession key
ReadFile path
Grep, GlobExact argument string
WebFetchURL
BashNever superseded

When a newer result for the same key exists, the older result block is replaced with a [superseded: <tool>(<key>)] placeholder, keeping message structure intact but recovering tokens.

Enable or disable in settings.toml:

micro_compact_enabled = true    # default: true

Manual /compact

/compact triggers an immediate compaction of the current conversation through the configured Compactor (the same path used by autocompact). A compact.event log entry is emitted and a compact.event metric is recorded if telemetry is enabled.

/compact

No flags. The compactor strategy (summarizing vs. micro) is determined by the active configuration — see Hooks for the PreCompact / PostCompact hook events that fire around each compaction.

/clear

/clear resets the conversation to an empty state and zeroes the ContextWindow counter. The session file is updated. Use it to start a fresh sub-task without opening a new session.

/clear

PreCompact and PostCompact hooks

Caliban fires PreCompact before compaction begins and PostCompact after it completes. These hook events are available to external scripts and MCP handlers.

# In settings.toml [hooks]
[hooks]
PreCompact  = [{ type = "command", command = "echo compacting…" }]
PostCompact = [{ type = "command", command = "notify-send 'compact done'" }]

See Hooks for the full hook configuration reference.

Prompt caching

Caliban uses Anthropic-style prompt caching by default to reduce cost on repeated turns. A cache marker is placed on the last user message when its estimated token count meets the minimum threshold.

Setting / flagDefaultDescription
--no-prompt-cacheoffDisable prompt caching for this run
CALIBAN_NO_PROMPT_CACHEunsetSame as --no-prompt-cache via environment variable
min_cache_block_tokensMinimum tokens on the last user message to merit a cache marker

Configure min_cache_block_tokens in settings.toml:

min_cache_block_tokens = 1024   # omit to use the upstream default

When to disable prompt caching

Use --no-prompt-cache during development when you want to measure raw latency without cache effects, or when debugging unexpected responses that might be served from a stale cache hit.

Tool result size cap

Caliban can cap the character length of individual tool results before they are appended to the conversation. This prevents a single large Read or Bash output from consuming a disproportionate share of the context window.

tool_result_cap_chars = 65536   # 0 disables the cap (default)

Summary of relevant settings

Setting keyTypeDefaultDescription
auto_compact_thresholdfloat0.75Utilization (0–1) that triggers autocompact; null disables
micro_compact_enabledbooltrueEnable the LLM-free per-turn supersession pass
min_cache_block_tokensintegerMinimum tokens to place the prompt cache marker
tool_result_cap_charsinteger0Per-result character cap; 0 disables

Print Mode

Print mode is caliban's non-interactive entry point. Instead of launching the TUI, it drives the agent to completion and writes results to stdout — making caliban scriptable from a shell, a CI job, or any program that can invoke a subprocess.

Activating print mode

MethodExample
-p / --print flagcaliban -p "summarize this repo"
--output-format flagcaliban --output-format json "fix the bug"
Auto-headlesscaliban detects a piped stdout or non-TTY stdin and enters print mode automatically

Auto-headless fires when a prompt is given and stdout is piped or stdin is not a TTY. Pass --no-auto-print to suppress this inference and keep the TUI even in piped contexts.

Choosing an output format

--output-format text|json|stream-json
FormatOutput
textThe assistant's final reply, streamed to stdout as plain text. Default.
jsonA single JSON object identical to the result frame in stream-json. Useful for jq consumers that only need the final answer and cost totals.
stream-jsonNewline-delimited JSON (NDJSON). One frame per event — system/init first, per-turn tool and message frames, result last. The full automation contract; see The stream-json Protocol.

Supplying input

By default caliban reads the prompt from the positional argument or --prompt. To pipe multi-line input, pass - as the prompt value and write to stdin:

git diff HEAD | caliban -p - "review these changes"

For multi-turn scripted sessions use --input-format stream-json to send NDJSON user frames on stdin instead. When this flag is active, a non-- inline prompt is rejected at startup (exit 64) to prevent accidentally bypassing the frame parser. See The stream-json Protocol for details.

Budget guard

--max-budget-usd <USD>

Caps the cumulative spend for a run. Cost is tracked against the vendored rate card in caliban-telemetry. When the budget is exceeded after a turn completes, caliban emits a result frame with subtype: "budget_exceeded" and exits 137. Unknown (provider, model) pairs contribute $0.00 and emit a single warning — the run is not blocked.

Deterministic runs with --bare

caliban --bare -p "count lines of code"

--bare skips hooks, skills, plugins, MCP server discovery, auto-memory, and CLAUDE.md walk-up. The agent runs with only its built-in tools and the flags you supply. Use it when you need a fully reproducible run that ignores user and project settings.

bare vs. --no-auto-print

--bare controls what the agent loads; --no-auto-print controls whether headless mode fires automatically. They are independent.

Exit codes

CodeCondition
0Success
1Generic runtime error (provider error, hook denial, tool crash)
2Schema validation failed (--json-schema)
64Bad flags / malformed stream-json input (EX_USAGE)
66--resume <name> not found, or empty stream-json stdin (EX_NOINPUT)
75--max-turns exceeded (EX_TEMPFAIL)
78Config error — settings parse failure, stdin > 10 MB (EX_CONFIG)
124Cancelled (Ctrl-C / SIGTERM from the agent loop)
130Real SIGINT — second Ctrl-C reaching the harness
137--max-budget-usd exceeded

CI scripts can distinguish budget exhaustion from genuine failures without parsing stdout: $? carries the signal.

Session persistence

Print-mode runs honour --session <NAME>, --continue (-c), and --resume the same way as interactive sessions. Pass --no-save to skip writing the session back to disk after the run.

The stream-json Protocol

--output-format stream-json is caliban's full automation contract. It emits newline-delimited JSON (NDJSON) to stdout, one frame per line, in a well-defined order. Downstream programs parse the stream with any JSON library and route on the type (and subtype) fields.

The protocol mirrors Claude Code's stream-json shape closely enough that most existing consumers work with minimal changes, while remaining provider-agnostic — token field names and cost breakdowns differ by provider and are not byte-identical to Claude Code.

Output frame types

system/init — first frame of every run

Emitted before any agent activity begins.

{
  "type": "system",
  "subtype": "init",
  "session_id": "a3f7c2d1-...",
  "model": "anthropic/claude-sonnet-4-6",
  "tools": ["Bash", "Edit", "Glob", "Grep", "Read", "Write"],
  "plugins": [],
  "settingSources": ["managed", "user", "project"],
  "mcp_servers": [],
  "bare_mode": false,
  "cwd": "/home/ci/repo",
  "permission_mode": "acceptEdits"
}

settingSources uses camelCase for Claude Code parity. permission_mode values are default, acceptEdits, plan, auto, dontAsk, bypassPermissions, or "disabled" (when --no-permissions is in effect).

system/api_retry

Emitted when the provider triggers a retry (rate-limit, overload, transient network error).

{
  "type": "system",
  "subtype": "api_retry",
  "attempt": 2,
  "max_retries": 5,
  "retry_delay_ms": 1500,
  "error_status": 529,
  "error_category": "overloaded"
}

error_category values: overloaded, rate_limit, timeout, network, server_error, other.

user — echo of the user prompt

Only emitted when --replay-user-messages is set.

{
  "type": "user",
  "content": [{"type": "text", "text": "fix the failing tests"}]
}

text — incremental assistant text delta

Only emitted when --include-partial-messages is set.

{"type": "text", "delta": "Here is the fix: "}

thinking — incremental reasoning delta

Emitted under --include-partial-messages when the model streams reasoning content (extended thinking models).

{"type": "thinking", "delta": "Let me check the test output…"}

tool_use and tool_result — progress frames

Each tool invocation produces a tool_use frame (emitted once the model finishes streaming the tool's input JSON) immediately followed by a tool_result frame (emitted once the tool completes).

{"type": "tool_use", "id": "toolu_01ABC", "name": "Bash", "input": {"command": "cargo test"}}
{"type": "tool_result", "tool_use_id": "toolu_01ABC", "is_error": false, "content": [{"type": "text", "text": "test result: ok. 42 passed"}]}

message — full assistant message (authoritative)

Emitted at the end of each turn when --include-partial-messages is not set. When --include-partial-messages is set, text deltas stream via text frames instead and no message frame is emitted.

{
  "type": "message",
  "role": "assistant",
  "content": [
    {"type": "text", "text": "All tests pass now."},
    {"type": "tool_use", "id": "toolu_01ABC", "name": "Bash", "input": {"command": "cargo test"}}
  ]
}

Tool call duplication is intentional

Each tool call appears in both a short tool_use/tool_result pair and inside the subsequent message frame's content array. The short pair is a progress indicator; the message frame is the authoritative record for transcript reconstruction. Do not deduplicate — count one tool call per tool_use frame, not two.

hook_event

Only emitted when --include-hook-events is set.

{
  "type": "hook_event",
  "hookEventName": "PreToolUse",
  "hookSpecificOutput": {"matcher": "Bash", "decision": "allow"}
}

hookEventName and hookSpecificOutput are camelCase (ADR 0024 parity).

warning

Non-fatal informational frames that do not terminate the run. Currently emitted for model substitution detected at the provider level.

{
  "type": "warning",
  "subtype": "model_mismatch",
  "message": "model mismatch: requested \"llama3.1\" but provider responded with \"llama3.2\"",
  "details": {"requested": "llama3.1", "actual": "llama3.2"}
}

result — always the last frame

{
  "type": "result",
  "subtype": "success",
  "result": "All 42 tests pass.",
  "session_id": "a3f7c2d1-...",
  "total_cost_usd": 0.0034,
  "turns": 3,
  "total_input_tokens": 8210,
  "total_output_tokens": 621
}

subtype values:

subtypeMeaningKey fields
successRun completed normallyresult (assistant reply)
errorProvider error, hook denial, tool crash, or schema validation failureerror, last_assistant_text, tool_calls_seen
max_turns--max-turns was reached (exit 75)last_assistant_text, tool_calls_seen
budget_exceeded--max-budget-usd was reached (exit 137)last_assistant_text, tool_calls_seen
cancelledRun was cancelled by Ctrl-C / SIGTERM (exit 124)last_assistant_text, tool_calls_seen
max_tokensPer-turn output token budget exhaustedlast_assistant_text, tool_calls_seen

For non-success subtypes, result is absent. Read last_assistant_text for the most recent assistant reply and tool_calls_seen to distinguish an actively-looping agent (many tool calls, no clean finish) from one that stalled silently.

Stream-json input (--input-format stream-json)

Pass --input-format stream-json to make caliban read NDJSON user frames from stdin instead of a single prompt. This lets you drive multi-turn conversations from any language without a pseudo-TTY.

{"type": "user", "content": "fix the lint warnings"}
{"type": "user", "content": [{"type": "text", "text": "now run the tests"}]}

content can be a plain string or an array of {"type":"text","text":"…"} blocks. Unknown fields on user frames, unknown type values, and malformed JSON are hard parse errors — the run aborts with exit 64 and a result frame with subtype: "error". This is intentional: silent parsing of an unknown field would let a wrong envelope shape run the agent with a blank prompt.

A control/interrupt frame is accepted on stdin but the interrupt is not yet honored; caliban emits a stderr warning and continues.

When --input-format stream-json is active, an inline prompt is incompatible and is rejected at startup. Pass - (or omit the prompt entirely) to read from stdin.

Example NDJSON exchange

printf '{"type":"user","content":"how many Rust source files are here?"}\n' \
  | caliban --output-format stream-json \
            --input-format stream-json \
            --replay-user-messages \
            --bare
{"type":"system","subtype":"init","session_id":"b1c2...","model":"anthropic/claude-sonnet-4-6","tools":["Bash","Glob","Grep","Read"],"plugins":[],"settingSources":[],"mcp_servers":[],"bare_mode":true,"cwd":"/repo","permission_mode":"default"}
{"type":"user","content":[{"type":"text","text":"how many Rust source files are here?"}]}
{"type":"tool_use","id":"toolu_01","name":"Bash","input":{"command":"find . -name '*.rs' | wc -l"}}
{"type":"tool_result","tool_use_id":"toolu_01","is_error":false,"content":[{"type":"text","text":"142"}]}
{"type":"message","role":"assistant","content":[{"type":"text","text":"There are 142 Rust source files."},{"type":"tool_use","id":"toolu_01","name":"Bash","input":{"command":"find . -name '*.rs' | wc -l"}}]}
{"type":"result","subtype":"success","result":"There are 142 Rust source files.","session_id":"b1c2...","total_cost_usd":0.0012,"turns":1,"total_input_tokens":3100,"total_output_tokens":48}

Optional frame flags

FlagEffect
--include-partial-messagesEmit text and thinking delta frames as the model streams
--include-hook-eventsEmit a hook_event frame for each fired hook
--replay-user-messagesEcho each user prompt back as a user frame
  • Print Mode — activating headless mode and output formats
  • CI Patterns — parsing stream-json in scripts and Actions

Structured Output

--json-schema tells caliban to force the assistant's final reply into a JSON shape that matches a given schema. This is useful when a downstream script needs a machine-readable payload rather than freeform prose — a CI gate that needs a structured pass/fail verdict, a code-generation pipeline that expects a specific object shape, or any tool that would otherwise parse the reply with fragile string matching.

Supplying a schema

--json-schema <FILE_OR_JSON>

The argument is either:

  • A path to a .json file: --json-schema ./schema.json
  • Inline JSON (detected when the value starts with { or [): --json-schema '{"type":"object","required":["ok","message"]}'

What caliban does

  1. Runs the agent loop normally.
  2. After the final assistant turn, scans the reply for a balanced {...} JSON object. If the whole reply is valid JSON it is used as-is; otherwise the first balanced {...} block is extracted.
  3. Validates the extracted object against the schema (required fields present, top-level and per-property types match).
  4. On success: the validated object appears in the structured_output field of the result frame, and the process exits 0.
  5. On failure: the result frame has subtype: "error" and the validation message appears in error. The process exits 2.

Validation scope

The built-in validator checks required fields and top-level type / per-property type constraints. It does not implement the full JSON Schema specification (no $ref, oneOf, pattern, etc.). Native provider-level structured output via the model router is planned and will extend coverage when available.

Worked example

Suppose you want caliban to report whether a repository's tests pass, in a structured format.

schema.json

{
  "type": "object",
  "required": ["passed", "summary"],
  "properties": {
    "passed": {"type": "boolean"},
    "summary": {"type": "string"},
    "failure_count": {"type": "integer"}
  }
}

Invocation

caliban \
  --output-format json \
  --json-schema ./schema.json \
  --bare \
  -p "Run the test suite and tell me whether it passed. Reply only with JSON."

Successful result frame (stdout)

{
  "type": "result",
  "subtype": "success",
  "result": "{\"passed\": true, \"summary\": \"42 tests passed, 0 failed\", \"failure_count\": 0}",
  "session_id": "...",
  "total_cost_usd": 0.0021,
  "turns": 2,
  "total_input_tokens": 5400,
  "total_output_tokens": 310,
  "structured_output": {
    "passed": true,
    "summary": "42 tests passed, 0 failed",
    "failure_count": 0
  }
}

Read structured_output in your script:

result=$(caliban --output-format json --json-schema schema.json --bare \
           -p "Run tests and reply with JSON.")
passed=$(echo "$result" | jq '.structured_output.passed')
if [ "$passed" != "true" ]; then
  echo "Tests failed"
  exit 1
fi

Failed validation (exit 2)

{
  "type": "result",
  "subtype": "error",
  "error": "missing required field `passed`",
  "session_id": "...",
  "total_cost_usd": 0.0018,
  "turns": 1,
  "total_input_tokens": 4800,
  "total_output_tokens": 95,
  "last_assistant_text": "All tests passed."
}

Tips

  • Instruct the model to reply only with JSON in your prompt. Models that wrap their answer in prose (e.g. "Here is the result: {...}") are handled — caliban scans for the first balanced {...} — but pure JSON replies validate more reliably.
  • Combine with --bare to skip skills and hooks that might inject extra text into the reply.
  • In stream-json mode, the structured_output field appears in the final result frame the same as in json mode.
  • Print Mode — output formats and exit codes
  • CI Patterns — complete pipeline recipes using structured output

CI Patterns

This page puts the headless flags together into complete, copyable recipes for GitHub Actions and other CI environments. Before reading further, familiarise yourself with Print Mode, The stream-json Protocol, and Headless & Audit.

Key flags for CI

FlagPurpose
--bareSkip hooks, skills, plugins, MCP, auto-memory, CLAUDE.md. Deterministic — output depends only on what you pass.
--max-budget-usd <USD>Hard spend cap; exit 137 if exceeded. Prevents runaway costs in long jobs.
--permission-mode acceptEditsAllow file edits without prompting; still denies shell commands the rules don't cover.
--allow <PAT>Add an Allow rule at top priority for this invocation only (repeatable).
--no-saveDon't write the session to disk — keeps CI agents stateless.
--output-format stream-jsonFull NDJSON output for structured parsing.
--output-format jsonSingle JSON result object — simpler for scripts that only need the answer and exit code.

Exit codes are the primary success signal. See Print Mode — Exit codes for the full table. $? == 0 means success; $? == 137 means the budget cap fired.

Recipe 1 — Simple text answer in GitHub Actions

Suitable for jobs that just need a freeform answer and check the exit code.

# .github/workflows/caliban-check.yml
name: caliban check

on: [push]

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run caliban review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          caliban \
            --bare \
            --max-budget-usd 0.50 \
            --permission-mode acceptEdits \
            --no-save \
            -p "Review the diff for obvious bugs and print a one-sentence verdict."

The job fails if caliban exits non-zero (runtime error, budget exceeded, etc.).

Recipe 2 — Structured output with jq parsing

Use --output-format json and --json-schema when you need machine-readable output — for example, a gate that checks whether a review verdict is "pass" or "fail".

#!/usr/bin/env bash
# ci/review-gate.sh
set -euo pipefail

RESULT=$(caliban \
  --output-format json \
  --json-schema '{"type":"object","required":["verdict","reason"]}' \
  --bare \
  --max-budget-usd 1.00 \
  --permission-mode acceptEdits \
  --allow "Bash:git diff*" \
  --allow "Read" \
  --no-save \
  -p "Review the staged changes. Reply ONLY with JSON: {\"verdict\": \"pass\"|\"fail\", \"reason\": \"<one sentence>\"}")

echo "Raw result: $RESULT"
VERDICT=$(echo "$RESULT" | jq -r '.structured_output.verdict')

if [ "$VERDICT" = "pass" ]; then
  echo "Review passed: $(echo "$RESULT" | jq -r '.structured_output.reason')"
  exit 0
else
  echo "Review failed: $(echo "$RESULT" | jq -r '.structured_output.reason')"
  exit 1
fi

Exit code vs. structured output

Check $? first: if caliban exits non-zero before emitting a result frame (bad flags, budget blown before any turn, etc.) jq will fail on empty input. A pattern like RESULT=$(caliban … || true) followed by a $? check is more robust.

Recipe 3 — Multi-turn stream-json pipeline

For jobs that drive several agent turns or need to observe tool calls in real time, parse the NDJSON stream line by line.

#!/usr/bin/env bash
# ci/stream-pipeline.sh
set -euo pipefail

TASKS=$(cat <<'EOF'
{"type":"user","content":"Run the test suite and report any failures."}
{"type":"user","content":"If any tests failed, suggest a fix."}
EOF
)

LAST_RESULT=""
TOOL_CALLS=0

while IFS= read -r line; do
  [ -z "$line" ] && continue
  TYPE=$(echo "$line" | jq -r '.type')
  case "$TYPE" in
    system)
      SUBTYPE=$(echo "$line" | jq -r '.subtype')
      if [ "$SUBTYPE" = "init" ]; then
        echo "[init] model=$(echo "$line" | jq -r '.model')"
      fi
      ;;
    tool_use)
      TOOL_CALLS=$((TOOL_CALLS + 1))
      echo "[tool] $(echo "$line" | jq -r '.name')"
      ;;
    result)
      LAST_RESULT="$line"
      SUBTYPE=$(echo "$line" | jq -r '.subtype')
      COST=$(echo "$line" | jq -r '.total_cost_usd')
      echo "[result] subtype=$SUBTYPE cost=\$$COST tool_calls=$TOOL_CALLS"
      ;;
  esac
done < <(echo "$TASKS" | caliban \
    --output-format stream-json \
    --input-format stream-json \
    --bare \
    --max-budget-usd 2.00 \
    --permission-mode acceptEdits \
    --allow "Bash:cargo test*" \
    --no-save)

# Final check
EXIT_CODE=$?
SUBTYPE=$(echo "$LAST_RESULT" | jq -r '.subtype')
if [ "$EXIT_CODE" -ne 0 ] || [ "$SUBTYPE" != "success" ]; then
  echo "Run did not succeed (exit=$EXIT_CODE subtype=$SUBTYPE)"
  exit 1
fi

Permissions in headless mode

By default, caliban inherits all user and project permission rules in headless mode, just as in interactive sessions. For CI you typically want tighter control:

  • --permission-mode acceptEdits — auto-allows file edits; still asks (or denies) for shell commands not covered by a rule.
  • --allow "Bash:git *" — add a top-priority Allow rule for specific shell patterns.
  • --deny "Bash:rm -rf*" — add a top-priority Deny rule.
  • --bare — skip all settings-derived rules (only built-in defaults apply).

For a full discussion of permission modes and how they interact with headless runs, see Headless & Audit.

Never use --allow-dangerously-skip-permissions in CI

bypassPermissions mode disables all permission gating. In a CI context this means an adversarially-crafted prompt or tool output could instruct caliban to delete files, exfiltrate secrets, or make network calls. Use acceptEdits instead and add explicit --allow rules for the shell patterns your job actually needs.

Parsing exit codes in shell

caliban --bare --max-budget-usd 0.20 -p "summarize the diff" || {
  CODE=$?
  case $CODE in
    75)  echo "Max turns exceeded";;
    137) echo "Budget cap hit";;
    124) echo "Cancelled";;
    *)   echo "Error: exit $CODE";;
  esac
  exit $CODE
}

Telemetry & Cost

caliban tracks token usage and USD cost for every session using caliban-telemetry (ADR 0033). Cost accounting and context-window tracking work for all users regardless of whether OTLP export is enabled. OTLP emission to an external collector is opt-in.

Cost accounting

After each provider response, caliban-telemetry multiplies token counts by per-model rates from a vendored YAML rate card. The card ships with known rates for Anthropic, OpenAI, Google, Bedrock, Vertex, and Ollama (Ollama rows are $0.00).

Unknown (provider, model) pairs contribute $0.00 and emit a single debounced warning per session. Rates are updated in-tree; operators can override the card with CALIBAN_RATES_YAML=/path/to/rates.yaml.

USD arithmetic uses rust_decimal internally to avoid floating-point drift. Values are converted to f64 only at OTLP emit boundaries.

Slash commands

These commands work in the TUI regardless of whether OTLP export is on.

CommandDescription
/costCumulative USD spend with a per-model breakdown
/usageCumulative token counts (input and output) with per-model breakdown
/contextContext-window utilization — per-message-kind token breakdown, percentage of the model's context window used

The /cost and /usage overlays share the same underlying CostAccumulator; /cost leads with dollar amounts, /usage leads with token counts. /context draws on ContextWindow, which is updated independently of OTLP emission.

Enabling OTLP export

OTLP export is off by default. Turn it on with the CALIBAN_ENABLE_TELEMETRY environment variable or the enable_telemetry setting:

Environment variable (any session)

CALIBAN_ENABLE_TELEMETRY=1 caliban

settings.toml / settings.json (persistent)

enable_telemetry = true

Privacy opt-outs DISABLE_TELEMETRY=1 and DO_NOT_TRACK=1 force-disable OTLP emission even when the master switch is on.

OTLP configuration

caliban adopts the standard OTEL_* env-var contract verbatim:

VariableDefaultPurpose
OTEL_EXPORTER_OTLP_ENDPOINTCollector endpoint (required for OTLP)
OTEL_EXPORTER_OTLP_PROTOCOLgrpcgrpc, http/protobuf, or http/json
OTEL_EXPORTER_OTLP_HEADERSStatic auth / routing headers (k=v,k2=v2)
OTEL_METRIC_EXPORT_INTERVAL60sHow often metrics are flushed
OTEL_LOGS_EXPORTERotlpotlp, console, or none
OTEL_METRICS_EXPORTERotlpSame options
OTEL_TRACES_EXPORTERotlpSame options
OTEL_LOG_USER_PROMPTS0Include user prompt text in log spans
OTEL_LOG_TOOL_DETAILS0Include tool name/args in spans
OTEL_LOG_TOOL_CONTENT0Include full tool output in spans
OTEL_LOG_RAW_API_BODIES0Log raw provider request/response bodies (0, 1, or file:<dir>)

mTLS is configured via OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE, OTEL_EXPORTER_OTLP_CLIENT_KEY, and OTEL_EXPORTER_OTLP_CERTIFICATE.

Content logging is a privacy footgun

OTEL_LOG_USER_PROMPTS, OTEL_LOG_TOOL_CONTENT, and OTEL_LOG_RAW_API_BODIES send potentially sensitive content to your collector. Ensure your collector pipeline is appropriately access-controlled before enabling these.

Dynamic OTLP headers

Short-lived bearer tokens (e.g. from a secrets manager) can be injected without restarting caliban. Set telemetry.otel_headers_helper in your settings to a path; caliban spawns it at startup and periodically (telemetry.otel_headers_refresh, default 5m), parses stdout as key=value lines, and merges them with OTEL_EXPORTER_OTLP_HEADERS (helper wins on collision).

Alternatively, the env-var escape hatch CALIBAN_OTEL_HEADERS_HELPER=/path/to/script achieves the same effect without a settings file.

Metric names

OTLP metrics use the caliban. prefix (mirroring Claude Code's claude_code. names):

MetricKindDescription
caliban.session.countCounterSession start/end lifecycle events
caliban.cost.usageCounter (USD)Cumulative cost per session
caliban.token.usageCounterInput and output tokens
caliban.lines_of_code.countCounterLines touched by file-edit tools
caliban.code_edit_tool.decisionCounterPermission decisions on edit tools
caliban.active_time.totalGauge (seconds)Wall time the agent loop ran

Health Checks

caliban doctor runs a suite of local checks and reports whether your installation is healthy. It exits 0 when all checks pass or warn, and exits 1 if any check fails. The same checks are available as the /doctor slash command in the TUI.

Running doctor

# Standard checks (no network calls)
caliban doctor

# Deep checks — pings every configured provider (costs one API call per provider)
caliban doctor --deep

Sample output:

caliban doctor — 11 check(s):
  ✓ settings — 2 scope file(s) loaded
  ✓ sandbox — tool dispatch goes via caliban-sandbox::SandboxedShim
  ✓ checkpoint_store — /home/user/.local/share/caliban/checkpoints
  ✓ session_store — /home/user/.local/share/caliban/sessions (writable)
  ✓ skills — 3 skill(s) loaded (scanned: /home/user/.claude/skills, ./.claude/skills)
  ✓ claudemd — 2 CLAUDE.md ancestor(s) found
  ✓ workspace — /home/user/repo (writable)
  ! ollama — OLLAMA_BASE_URL unset (no probe attempted; use --deep to ping localhost)
  ✓ openai — OPENAI_BASE_URL unset (no probe attempted; use --deep to ping api.openai.com)
  ✓ anthropic — https://api.anthropic.com reachable (45 model(s))
  ✓ google — GEMINI_BASE_URL unset (no probe attempted; use --deep to ping generativelanguage.googleapis.com)

What each check covers

CheckWhat it verifies
settingsLayered settings files load without parse errors; at least one scope file is present
sandboxTool dispatch is wired through the OS sandbox shim
checkpoint_storeThe checkpoint store path is accessible
session_storeThe session store path exists and is writable
skillsSkill roots are scanned and skills load without errors
claudemdAt least one CLAUDE.md file is found in the workspace ancestry
workspaceThe current working directory is accessible and writable
ollamaOllama endpoint reachability (see below)
openaiOpenAI / OpenAI-compatible endpoint reachability
anthropicAnthropic endpoint reachability
googleGoogle Gemini endpoint reachability

Provider reachability checks

Provider rows always appear in the output so you can see at a glance which providers are configured. The behavior depends on whether --deep is passed:

Without --deep:

  • If the provider's base-URL env var is set, caliban probes the endpoint (Ollama: /api/tags; others: /v1/models).
  • If the env var is unset, the row passes with a note that no probe was attempted. Use --deep to ping the default endpoint.

With --deep:

  • Caliban pings the configured (or default) endpoint unconditionally. This costs one real API call per provider that has an API key configured.
  • If --model <MODEL> was passed on the same invocation and a provider's model listing is available, the requested model is verified to be present. A missing model is reported as a Fail row.

Ollama without an API key

Ollama does not require an API key. With --deep, caliban always probes http://localhost:11434 (or OLLAMA_BASE_URL if set) regardless of key configuration.

Exit codes

CodeMeaning
0All checks passed or warned
1At least one check failed

CI scripts can gate on caliban doctor to catch misconfigured installations before running a long job:

caliban doctor || { echo "caliban health check failed"; exit 1; }

/doctor in the TUI

The /doctor slash command runs the same checks inside an interactive session and prints the results to the transcript. Provider pings are always deep when invoked via /doctor (the session is already running and API keys are confirmed reachable). The /status command shows a brief one-line summary of the daemon and active session state.

CLI Reference

caliban is the main binary. Run it with no arguments to enter the interactive TUI; supply a prompt or flags to drive it headlessly or invoke a subcommand.

caliban [FLAGS/OPTIONS] [PROMPT]
caliban [FLAGS/OPTIONS] <SUBCOMMAND>

Prompts

FlagDefaultDescription
PROMPT (positional)User prompt text. Use - to read from stdin.
--prompt <TEXT>Alternative way to pass the prompt (same effect as positional).

Headless / Print Mode

These flags activate and configure non-interactive (-p) mode. See Print Mode and The stream-json Protocol.

FlagDefaultDescription
-p, --print [PROMPT]Headless mode. Drives the agent non-interactively. Accepts an optional prompt; otherwise reads from --prompt, the positional PROMPT, or stdin (capped at 10 MiB).
--output-format <FMT>textStream output format. Values: text, json, stream-json.
--input-format <FMT>textStdin format. Values: text, stream-json.
--no-auto-printfalseSuppress the automatic headless dispatch when stdout is piped or stdin is non-TTY. Explicit --print / --output-format always override this.
--max-budget-usd <USD>Abort the run (exit 137) once cumulative cost exceeds this value in USD. Unknown model/provider pairs contribute $0 and emit a warning.
--barefalseCI-deterministic mode: skips hooks, skills, plugins, MCP, auto-memory, and CLAUDE.md discovery.
--json-schema <FILE_OR_JSON>Force structured final output matching the given JSON Schema. Value can be inline JSON or a path to a .json file.
--include-partial-messagesfalseEmit assistant text deltas as separate text frames in stream-json mode (default: aggregate into one message frame).
--include-hook-eventsfalseEmit a hook_event frame per fired hook event in stream-json mode.
--replay-user-messagesfalseEcho each user prompt as a user frame in stream-json mode.

Session

FlagDefaultDescription
-c, --continuefalseResume the most recently updated session.
-r, --resume <NAME>Resume a named session.
--session <NAME>Load or create a named session; persists to the configured sessions directory.
--no-savefalseDon't write the session back to disk after the run.
--sessions-dir <DIR>platform defaultOverride the sessions directory.

Model & Provider

FlagDefaultDescription
--provider <PROVIDER>Resolved from settings, then anthropicProvider to use. Values: anthropic, openai, ollama, google.
--model <MODEL>Provider default (see table below)Model name.
--fallback-model <MODEL>From settingsFallback model when the primary errors (ADR 0038).
--max-tokens <N>8192Per-turn output token limit (must be ≥ 1).
--max-turns <N>50Maximum agent loop iterations.
--temperature <F>Sampling temperature in [0.0, 2.0].

Provider defaults:

ProviderDefault model
anthropicclaude-sonnet-4-6
openaigpt-5.5
ollamallama3.1
googlegemini-2.0-flash

Workspace & Tools

FlagDefaultDescription
--workspace <DIR>Current working directoryWorkspace root for file and shell tools. Must be an existing directory.
--no-toolsfalseDisable all tools (chat-only mode).
--restrict-pathsfalseReject tool paths outside the workspace root.
--quietfalseSuppress tool-execution announcements.

System Prompt

These flags are mutually exclusive.

FlagDefaultDescription
--system <STRING>Override system prompt with the given text.
--system-file <PATH>Override system prompt with the contents of a file.
--no-systemfalseRun with no system prompt (disables the default).

Permissions

See Permission Modes and Managing Rules.

FlagDefaultDescription
--allow <PAT>Add an Allow rule at top priority. Repeatable. Pattern: Tool or Tool:first-arg-glob.
--deny <PAT>Add a Deny rule at top priority. Repeatable.
--ask <PAT>Add an Ask rule at top priority. Repeatable.
--permission-mode <MODE>From settings or defaultInitial permission mode. Valid values (camelCase): default, acceptEdits, plan, auto, dontAsk, bypassPermissions. Env: CALIBAN_DEFAULT_PERMISSION_MODE.
--no-permissionsfalseDisable permission gating entirely (all tool calls allowed). Env: CALIBAN_NO_PERMISSIONS. Conflicts with --allow, --deny, --ask, --auto-allow.
--auto-allowfalseDangerous. Allow the model to run any Ask-rule tool without prompting in non-interactive mode. Env: CALIBAN_AUTO_ALLOW.
--allow-dangerously-skip-permissionsfalseDangerous. Required to enter bypassPermissions mode. Without this flag the binary refuses to start in bypass mode.
--disable-auto-modefalseDisable the auto-mode classifier; every call falls through to the Ask handler (ADR 0029). Env: CALIBAN_DISABLE_AUTO_MODE.
--permission-prompt-tool <MCP_TOOL>Route permission Ask events to the named MCP tool via the MCP elicitation channel (ADR 0023 Phase C).

Hooks, Skills, MCP & Plugins

FlagDefaultDescription
--no-hooksfalseBypass every external hook handler. In-process hooks (PermissionsHook, audit) still run. Env: CALIBAN_NO_HOOKS.
--no-skillsfalseDisable the Skill tool (no skill discovery at startup). Env: CALIBAN_NO_SKILLS.
--no-mcpfalseDisable MCP server discovery (skips settings.json mcp_servers and the legacy mcp.toml shim). Env: CALIBAN_NO_MCP.
--no-pluginsfalseDisable plugin discovery (ADR 0030). Env: CALIBAN_NO_PLUGINS.
--mcp-oauth-port <PORT>0 (ephemeral)Override the loopback port for the OAuth callback server (ADR 0023 Phase C). Env: CALIBAN_MCP_OAUTH_PORT.
--no-sub-agentfalseDisable the built-in AgentTool (the sub-agent primitive). Env: CALIBAN_NO_SUB_AGENT.

Config & Settings

FlagDefaultDescription
--config <PATH>Walk-up discoveryExplicit path to caliban.toml. When the file declares [router], a model router is wired (ADR 0038). Env: CALIBAN_ROUTER_CONFIG.
--settings <FILE_OR_JSON>Inject a virtual settings scope above local (ADR 0026). Accepts inline JSON or a path to .json / .toml.
--setting-sources <CSV>All scopesRestrict which settings.json scopes are read. CSV of managed,user,project,local.

Caching & Performance

FlagDefaultDescription
--max-attach-bytes <N>262144 (256 KB)Maximum size of a single @-attachment in bytes. Env: CALIBAN_MAX_ATTACH_BYTES.
--attach-budget-bytes <N>1048576 (1 MB)Aggregate size cap across all @-attachments in one message. Env: CALIBAN_ATTACH_BUDGET_BYTES.
--no-prompt-cachefalseDisable Anthropic-style prompt caching. Env: CALIBAN_NO_PROMPT_CACHE.
--no-parallel-toolsfalseDisable parallel tool execution (run tool_use blocks serially). Env: CALIBAN_NO_PARALLEL_TOOLS.
--parallel-tool-limit <N>CPU cores − 1 (min 1)Max concurrent tool invocations per turn. Env: CALIBAN_PARALLEL_TOOL_LIMIT.

Diagnostics

FlagDefaultDescription
--debugfalseAppend-log events and draws to the platform debug log. CALIBAN_DEBUG (any non-empty value) also enables this.

Background Agents

FlagDefaultDescription
--bg <TASK>Spawn a background sub-agent with the given task and return immediately. Equivalent to caliban agents spawn --bg --prompt <TASK> (ADR 0037).

Subcommands

caliban doctor [--deep]

Run health checks against the local caliban install (settings, MCP, sandbox, stores, providers). Exit 0 on pass, 1 on failure.

OptionDescription
--deepInclude deep checks (provider auth pings — costs one API call per configured provider).

caliban config

Inspect and migrate settings (ADR 0026).

Sub-subcommandDescription
config printPrint the merged effective settings as JSON, including the per-key scope chain. Honors --settings / --setting-sources.
config migrate [--dry-run]Round-trip legacy per-feature TOMLs (permissions.toml, mcp.toml, hooks.toml) into a single project-scope settings.json under <workspace>/.caliban/.

caliban settings

Import and print settings files.

Sub-subcommandDescription
settings import --from <PATH> [--scope <SCOPE>] [--dry-run]Import a settings JSON (Claude Code / Codex / legacy caliban) into canonical caliban TOML. Default scope: project.
settings print [--scope <SCOPE>]Print the settings for a scope (or the merged effective settings). Default scope: project.

caliban perms

Manage permission rules across all config scopes. See Managing Rules.

Sub-subcommandDescription
perms list [--scope <SCOPE>] [--effective] [--json]List permission rules. --effective shows the merged rule list across all scopes.
perms test <TOOL> [INPUT_JSON]Test whether a tool call would be allowed, denied, or asked.
perms explain <TOOL> [INPUT_JSON]Show which rule first matches a tool call.
perms add <PATTERN> <ACTION> [--scope <SCOPE>] [--comment <TEXT>] [--reason <TEXT>]Add a permission rule. Action: allow, ask, or deny. Default scope: project.
perms remove [--index <N>] [--pattern <PAT>] [--scope <SCOPE>]Remove a permission rule by ordinal or pattern. Default scope: project.
perms import --from <PATH> [--scope <SCOPE>] [--dry-run]Import rules from a foreign config (Claude Code JSON, legacy caliban TOML). Default scope: user.
perms export [--scope <SCOPE>] [--format toml|json]Export permission rules to stdout. Default format: toml.
perms audit [--since <ISO>] [--tool <NAME>] [--action <ACTION>] [--head <N>]Show the permission-decision audit log.
perms lint [--scope <SCOPE>]Check for duplicate or conflicting rules. Default scope: project.

caliban agents

List, attach, and manage background sub-agents (ADR 0037).

Sub-subcommandDescription
agents listList registered background agents.
agents spawn --prompt <TEXT> [--label <LABEL>]Spawn a new background agent.
agents attach <ID>Stream a running agent's transcript live (Ctrl+D detaches).
agents logs <ID>Print the agent's session log.
agents kill <ID>Terminate an agent (SIGTERM → SIGKILL after grace period).
agents respawn <ID>Restart an agent with the same spawn spec.
agents rm <ID> [--force]Remove an agent from the registry (must be stopped unless --force).

Shortcut aliases (top-level sugar):

CommandEquivalent to
caliban attach <ID>caliban agents attach <ID>
caliban logs <ID>caliban agents logs <ID>
caliban stop <ID>caliban agents kill <ID>
caliban kill <ID>caliban agents kill <ID>
caliban respawn <ID>caliban agents respawn <ID>
caliban rm <ID> [--force]caliban agents rm <ID>

caliban daemon

Supervisor daemon management (ADR 0037).

Sub-subcommandDescription
daemon statusPrint daemon health and the socket path.
daemon stopAsk the daemon to shut down gracefully.

caliban router debug

Router diagnostics (ADR 0038).

Sub-subcommandDescription
router debugPrint the candidate list the router would resolve for a synthetic request, plus breaker state and effort knobs.

caliban plugin <VERB> [ARGS…]

Manage plugin packages (ADR 0030). The plugin CLI parses its own verbs directly:

VerbDescription
plugin listList all discovered plugins with enable/disable status.
plugin info <NAME>Show manifest details for a plugin.
plugin install <NAME>@<MARKETPLACE> [--yes]Install a plugin from a marketplace.
plugin install --dir <PATH>Install a plugin from a local directory.
plugin update <NAME> [--yes]Update an installed plugin.
plugin remove <NAME>Remove an installed plugin.
plugin enable <NAME>Enable a disabled plugin.
plugin disable <NAME>Disable an enabled plugin.

Run caliban plugin help for the full plugin CLI reference.


Exit codes

Caliban follows ADR 0025 exit-code conventions: 0 = success, 1 = check/health failure, 64 = usage error (EX_USAGE), 78 = configuration error (EX_CONFIG), 130 = Ctrl+C, 137 = budget exceeded.

Settings Schema

This page is a typed, structured listing of every key in the caliban settings file. For a narrative explanation of how scopes interact, how to locate each file, and how to edit settings interactively, see Settings Reference and Settings Layering.

Settings files are TOML by primary convention (settings.toml / settings.local.toml); JSON is accepted on import only. Unknown top-level keys are tolerated for forward-compat.


Model / Agent

KeyTypeDefaultDescription
agentstringAgent profile name (sub-agent dispatch hint).
modelstring | { provider, name }Primary model. Bare string (e.g. "claude-sonnet-4-6") or qualified object { provider = "anthropic", name = "..." }.
fallback_modelstring | { provider, name }Fallback model when the primary errors. Same shapes as model.
model_overrides{ string → string }{}Per-route model overrides. Keys are router route names (e.g. "fast-classifier"); values are model ids.
effort"low" | "medium" | "high" | "max" | "auto"Default reasoning effort level.

Permissions

Nested under the [permissions] table.

KeyTypeDefaultDescription
permissions.allowstring[][]Patterns that auto-allow (legacy bucket form).
permissions.askstring[][]Patterns that prompt the user (legacy bucket form).
permissions.denystring[][]Patterns that hard-deny (legacy bucket form).
permissions.rulesRuleSpec[][]Ordered v2 rule array. When non-empty, takes precedence over the three buckets above. Source order is preserved (first match wins).
permissions.enforcebooleanWhen true, refuse --no-permissions / bypass mode at startup.
permissions.default_modestringInitial permission mode at session start. Values: default, acceptEdits, plan, auto, dontAsk, bypassPermissions.
permissions.audit_logbooleantrueAppend-only permission-decision log toggle.

RuleSpec fields (used in permissions.rules entries):

FieldTypeDescription
patternstringGlob matching Tool or Tool:first-arg-glob (e.g. "Bash:git *").
action"allow" | "ask" | "deny"Decision for matching calls.
commentstring (optional)Human-readable comment shown in /permissions.
reasonstring (optional)Deny reason shown to the operator and logged.
expires_atISO 8601 timestamp (optional)Rule is skipped after this time.
[permissions]
# v2 ordered rules (preferred)
[[permissions.rules]]
pattern = "Bash:git *"
action  = "allow"
comment = "git commands OK"

[[permissions.rules]]
pattern = "Bash:rm *"
action  = "deny"
reason  = "use git revert"

[[permissions.rules]]
pattern = "*"
action  = "ask"

Hooks

KeyTypeDefaultDescription
hooks{ string → … }{}Raw hook event → handler list map (passed to caliban_agent_core::HooksConfig).
disable_all_hooksbooleanfalseKill-switch: disable every external hook handler.
allow_managed_hooks_onlybooleanfalseWhen true, only managed-scope hooks fire.
allowed_http_hook_urlsstring[][]HTTP-hook URL allowlist (glob patterns).
http_hook_allowed_env_varsstring[][]Environment variable names that HTTP hooks are permitted to read.

MCP Servers

Under [mcp_servers.<name>]. Each entry configures one MCP server.

KeyTypeDefaultDescription
type"stdio" | "http" | "sse""stdio"Transport selector. Also accepted as transport (TOML alias).
commandstring""Executable command (stdio only).
argsstring[][]Argv after the command (stdio only).
env{ string → string }{}Environment variables injected for the server process (stdio only).
cwdstringWorking directory override (stdio only).
urlstringAbsolute http:// or https:// URL (http/sse transports).
headers{ string → string }{}Static request headers (http/sse only).
oauth"off" | "auto" | "manual""off"OAuth mode (http/sse only).
disabledbooleanfalseMark this server disabled without removing the entry.
permissionsobjectPer-server permission scoping (composes with global rules).
[mcp_servers.linear]
command = "npx"
args    = ["-y", "@linear/mcp-server"]

Router

KeyTypeDefaultDescription
routerobjectRouter config (opaque; schema owned by caliban-model-router). Use caliban.toml [router] for the primary router config.

Memory

Nested under [memory].

KeyTypeDefaultDescription
memory.auto_memory_enabledbooleanEnable / disable auto-memory topic files.
memory.auto_memory_directorystringPlatform defaultDirectory for auto-memory topic files.
memory.cap_tokens_autointegerToken budget cap for the auto-memory tier.
memory.cap_tokens_claude_mdintegerToken budget cap for the CLAUDE.md tier.
memory.cap_tokens_combinedintegerCombined token budget cap across all tiers.

Plugins

KeyTypeDefaultDescription
pluginsobjectPlugin manager knobs (schema owned by caliban-plugins).

UI

KeyTypeDefaultDescription
output_stylestringActive output-style name (see Output Styles).
editor_modestringInput editing mode: "vim" or "emacs".
view_modestringTUI layout mode: "compact" or "expanded".
statusLine.commandstringRequired when statusLine is set. Shell command whose stdout prefixes the status bar.
statusLine.timeout_msinteger (50–5000)Maximum ms to wait for the status-line script.
statusLine.paddinginteger (0–8)Spaces of padding around the custom segment.
tuiobjectTUI knobs. Known sub-key: showCostInStatusline (boolean).

statusLine casing

statusLine uses camelCase on disk for Claude Code compatibility. The TOML alias status_line is also accepted.


Auth

KeyTypeDefaultDescription
api_key_helperstring | object | object[]Provider API-key supplier(s). Bare string = command path; object = { command, provider?, refreshIntervalMs?, slowHelperWarningMs? }; array = per-provider list.

Observability

KeyTypeDefaultDescription
enable_telemetrybooleanOTel / cost emitter toggle.

Context-Window Management

KeyTypeDefaultDescription
auto_compact_thresholdnumber (0–1) or null0.75Pre-turn autocompaction threshold (context utilization fraction). null disables autocompact.
micro_compact_enabledbooleantrueEnable the per-turn microcompact (LLM-free supersession) pass.
tool_result_cap_charsinteger (≥ 0)50000Global per-tool-result cap in characters. 0 disables.
min_cache_block_tokensinteger (≥ 0)1024Minimum estimated tokens on the last user message to merit the conversation-level cache marker.

Enterprise (Managed Scope)

KeyTypeDefaultDescription
parent_settings_behavior"block" | "augment""augment"When "block" in the managed scope, the managed layer flips to the top of the merge chain (enterprise lockdown).

Miscellaneous

KeyTypeDefaultDescription
additional_directoriesstring[][]Extra workspace roots to consult for CLAUDE.md and skills.
claude_md_excludesstring[][]Glob patterns to exclude from CLAUDE.md discovery (claudeMdExcludes).
env{ string → string }{}Environment-variable overrides applied to child processes spawned by caliban.

Slash Command Index

Type / in the interactive TUI to open the command picker, or type a command name directly. Commands marked hidden are accessible by name but do not appear in /help.

For a narrative introduction, see Slash Commands and Custom Slash Commands.


Session

CommandArgsDescription
/clearClear the transcript and conversation history. Keeps system prompt, todos, plan-mode, and skills cache.
/init[--force]Generate a CLAUDE.draft.md from available context sources (AGENTS.md, .cursorrules, .windsurfrules, README.md, git status). Refuses to overwrite an existing CLAUDE.md without --force.
/resume[query]List persisted sessions sorted by most-recently-updated, with an optional case-insensitive substring filter.
/recapSummarize the conversation so far without mutating history.
/export[path] [--format json]Export the session transcript to a file. Default format: Markdown. Default filename: caliban-session-<date>.md in the CWD. Pass --format json for JSON output.
/btw<question>One-shot ephemeral question to a fast model (routed as FastClassifier); result inlined to transcript without touching the main session.

Model & Auth

CommandArgsDescription
/model[id]With no args: list the active provider's known model ids and the currently-selected one. With an id: switch the active model at runtime (same-provider in v1).
/effort<level>Set reasoning effort for the next turn. Values: low, medium, high, max, auto.
/statusShow provider / auth / subscription status.
/loginRun the active provider's auth flow (full browser OAuth implementation pending the Auth spec).
/logoutClear cached credentials for the active provider (pending the Auth spec).
/setup-tokenGenerate a long-lived Anthropic OAuth token for CI use (pending the Auth spec).

Permissions

CommandArgsDescription
/permissionsOpen the permissions overlay. Shows current mode, bypass-latch state, and runtime rules. Tab cycles mode; d deletes the selected rule.

Observability

CommandArgsDescription
/usageShow cumulative token and cost usage for this session, per model.
/costShow cumulative cost and a per-(provider, model) breakdown with cache savings.
/contextShow context window utilization and the top-N largest content blocks (by character count).
/compactTrigger the configured compactor; reports dropped/summarized message count.
/doctor[--deep]Run startup-time health checks (settings, MCP, skills, hooks, auth). --deep adds provider auth pings.

Memory

CommandArgsDescription
/memory[list|show <slug>|edit <slug>|delete <slug>]View or edit memory tiers and auto-memory topic files. No args: show tier summary.

Configuration & Extensibility

CommandArgsDescription
/configOpen the tabbed settings editor overlay.
/hooksList configured hooks per event type with handler counts.
/mcpOpen the MCP server status overlay.
/pluginsList installed plugins with enable/disable status.
/agentsList sub-agents. (Full fleet overlay arrives with the sub-agent isolation spec; use caliban agents list from a shell for now.)
/skillsList skills loaded from .caliban/skills/ and other configured roots.

Plan Mode

CommandArgsDescription
/planToggle plan mode. When ON, mutating tools are blocked. Reflected in the active session and statusline.

Output

CommandArgsDescription
/output-styleShow the active output style and the available list. Change the style via CALIBAN_OUTPUT_STYLE or output_style in settings.

Diagnostics

CommandArgsDescription
/rewindOpen the checkpoint/rewind picker overlay (ADR 0028). Also opened by pressing Esc Esc.
/statuslineShow the active status-line command configuration (or instructions to set one).
/loop[--n=<count>] [--interval=<seconds>]Re-run the last assistant turn N times (bounded by --max-turns). Default: 3 repeats, 15-second interval.
/feedbackSubmit feedback to the configured endpoint. Requires feedback_url in settings.
/heapdumpCapture a heap profile (requires caliban to be rebuilt with --features=jemalloc-prof).
/tuiToggle fullscreen vs. default TUI mode (pending TUI ergonomics spec).

General

CommandArgsDescription
/helpList all visible registered slash commands.
/quitExit caliban.
/exitAlias for /quit (hidden).

Hidden commands

/exit, /plugin (alias for /plugins), and /system (view active system prompt) are registered but do not appear in /help output.

Custom slash commands

You can add your own slash commands by placing skill files under .caliban/skills/<name>/SKILL.md. See Custom Slash Commands.

Environment Variables

Caliban reads environment variables in two groups: CALIBAN_* variables that control the harness itself, and per-provider API-key and endpoint variables. Most CALIBAN_* flags mirror a corresponding CLI flag; the CLI flag always wins when both are set.


Provider API Keys

VariableProviderPurpose
ANTHROPIC_API_KEYAnthropicRequired. API key for the Anthropic provider.
ANTHROPIC_BASE_URLAnthropicOptional. Override the Anthropic API base URL (useful for proxies or Bedrock-compatible endpoints).
OPENAI_API_KEYOpenAIRequired when using OpenAI.
OPENAI_BASE_URLOpenAIOptional. Override the OpenAI API base URL (for LM Studio, Mistral, and other OpenAI-compatible endpoints).
OPENAI_ORG_IDOpenAIOptional. OpenAI organization ID.
OPENAI_PROJECTOpenAIOptional. OpenAI project ID.
AZURE_OPENAI_API_KEYAzure OpenAIRequired when using Azure OpenAI.
AZURE_OPENAI_RESOURCEAzure OpenAIRequired when using Azure OpenAI. Azure resource name.
AZURE_OPENAI_API_VERSIONAzure OpenAIOptional. API version string. Default: 2024-10-21.
GEMINI_API_KEYGoogleRequired when using the Google provider. GOOGLE_GEMINI_API_KEY is checked as a fallback.
GOOGLE_GEMINI_API_KEYGoogleFallback for GEMINI_API_KEY.
OLLAMA_BASE_URLOllamaOptional. Base URL for the Ollama server. Default: http://localhost:11434.

Headless & Print Mode

VariableDefaultDescription
CALIBAN_MAX_ATTACH_BYTES262144 (256 KB)Maximum size of a single @-attachment. Also settable via --max-attach-bytes.
CALIBAN_ATTACH_BUDGET_BYTES1048576 (1 MB)Aggregate size cap across all @-attachments in one message. Also settable via --attach-budget-bytes.

Permissions & Security

VariableDefaultDescription
CALIBAN_DEFAULT_PERMISSION_MODEdefaultInitial permission mode. Values: default, acceptEdits, plan, auto, dontAsk, bypassPermissions. CLI --permission-mode wins when set.
CALIBAN_NO_PERMISSIONSAny non-empty value disables permission gating (all tool calls allowed). Conflicts with --allow, --deny, --ask, --auto-allow.
CALIBAN_AUTO_ALLOWDangerous. Any non-empty value allows Ask-rule tools without prompting in non-interactive mode.
CALIBAN_DISABLE_AUTO_MODEAny non-empty value disables the auto-mode classifier; all calls fall through to Ask.

Caching & Performance

VariableDefaultDescription
CALIBAN_NO_PROMPT_CACHEAny non-empty value disables Anthropic-style prompt caching.
CALIBAN_NO_PARALLEL_TOOLSAny non-empty value forces serial tool execution.
CALIBAN_PARALLEL_TOOL_LIMITCPU cores − 1 (min 1)Maximum concurrent tool invocations per turn.

Hooks, Skills, MCP & Plugins

VariableDefaultDescription
CALIBAN_NO_HOOKSAny non-empty value bypasses every external hook handler. In-process hooks still run.
CALIBAN_NO_SKILLSAny non-empty value disables skill discovery at startup.
CALIBAN_NO_MCPAny non-empty value disables MCP server discovery.
CALIBAN_MCP_OAUTH_PORT0 (ephemeral)Loopback port for the MCP OAuth callback server (ADR 0023 Phase C).
CALIBAN_MCP_TIMEOUTTimeout (ms) for MCP server startup/connection.
CALIBAN_MCP_TOOL_TIMEOUTPer-tool-call timeout (ms) for MCP tools.
CALIBAN_NO_PLUGINSAny non-empty value disables plugin discovery.
CALIBAN_ENABLED_PLUGINSComma-separated list of plugin names to enable (all others disabled).
CALIBAN_PLUGIN_ROOTOverride the plugin install root directory.

Sub-agents

VariableDefaultDescription
CALIBAN_NO_SUB_AGENTAny non-empty value disables the built-in AgentTool.
CALIBAN_DAEMON_RUNTIME_DIRPlatform defaultOverride the runtime socket directory for the supervisor daemon.

Memory

VariableDefaultDescription
CALIBAN_DISABLE_AUTO_MEMORYAny non-empty value disables auto-memory topic-file writing.
CALIBAN_MEMORY_DIRPlatform defaultOverride the auto-memory topic files directory.
CALIBAN_MEMORY_BUDGET_TOKENSTotal token budget across all memory tiers.
CALIBAN_MEMORY_CAP_TOKENS_AUTOToken budget cap for the auto-memory tier.
CALIBAN_MEMORY_CAP_TOKENS_CLAUDE_MDToken budget cap for the CLAUDE.md tier.
CALIBAN_AUTO_MEMORY_DIRECTORYOverride the auto-memory directory (alias form).
CALIBAN_DISABLE_CLAUDE_MD_WALKAny non-empty value disables the CLAUDE.md walk-up discovery.
CALIBAN_ADDITIONAL_DIRECTORIES_CLAUDE_MDColon-separated list of extra directories to search for CLAUDE.md.
CALIBAN_CLAUDE_MD_EXCLUDESColon-separated glob patterns to exclude from CLAUDE.md discovery.
CALIBAN_APPROVE_IMPORTSAny non-empty value auto-approves CLAUDE.md @import statements.

Checkpoints

VariableDefaultDescription
CALIBAN_CHECKPOINT_ROOT~/.caliban/projectsOverride the checkpoint root directory.
CALIBAN_CHECKPOINT_DISABLEDAny non-empty value disables checkpoint recording and pruning.
CALIBAN_CHECKPOINT_MAX_FILE_BYTESMaximum checkpoint file size before rotation.
CALIBAN_CLEANUP_PERIOD_DAYSNumber of days after which old checkpoint files are pruned.

Configuration & Router

VariableDefaultDescription
CALIBAN_ROUTER_CONFIGWalk-up discoveryExplicit path to caliban.toml. Also settable via --config.
CALIBAN_STRICT_ROUTINGAny non-empty value enables strict routing (no fallback to default route on unknown purpose).
CALIBAN_API_KEY_HELPER_TTL_MSTTL in milliseconds for API key helper subprocess cache.

Output

VariableDefaultDescription
CALIBAN_OUTPUT_STYLEName of the active output style (see Output Styles).
CALIBAN_GRAPHICSGraphics capability hint (e.g. kitty, sixel).

Observability & Telemetry

VariableDefaultDescription
CALIBAN_ENABLE_TELEMETRYAny non-empty value enables OTel telemetry (settings enable_telemetry is also checked).
CALIBAN_OTEL_HEADERS_HELPERCommand to supply dynamic OTel export headers.
OTEL_EXPORTER_OTLP_ENDPOINTOTel OTLP exporter endpoint URL.
OTEL_EXPORTER_OTLP_PROTOCOLgrpcOTel OTLP transport protocol.
OTEL_EXPORTER_OTLP_HEADERSAdditional headers for the OTLP exporter.
OTEL_METRIC_EXPORT_INTERVAL60sOTel metric export interval.
OTEL_LOGS_EXPORTERotlpOTel logs exporter type.
OTEL_METRICS_EXPORTERotlpOTel metrics exporter type.
OTEL_TRACES_EXPORTERotlpOTel traces exporter type.
CALIBAN_RATES_YAMLPath to a YAML file overriding the built-in provider pricing rate card.

Debug

VariableDefaultDescription
CALIBAN_DEBUGAny non-empty value enables the file-backed tracing subscriber (appends to the platform debug log). Also settable via --debug.

Plugin Trust & Marketplace

VariableDefaultDescription
CALIBAN_BLOCKED_MARKETPLACESComma-separated list of marketplace names to block.
CALIBAN_STRICT_KNOWN_MARKETPLACESAny non-empty value blocks installs from unrecognized marketplaces.
CALIBAN_STRICT_PLUGIN_ONLY_CUSTOMIZATIONAny non-empty value restricts customization to plugins only (no user-level skills/hooks).

Provider precedence

When CALIBAN_PROVIDER is set, it overrides the --provider flag and settings-derived provider. This is the escape hatch for scripting scenarios where injecting a flag is inconvenient.

Files & Directories

Caliban follows platform conventions for each OS via the dirs crate. The tables below show the resolved path for each category on macOS, Linux (with XDG defaults), and Windows.

Override with environment variables

Many paths can be overridden with environment variables — see Environment Variables. The CALIBAN_CHECKPOINT_ROOT, CALIBAN_MEMORY_DIR, CALIBAN_DAEMON_RUNTIME_DIR, and CALIBAN_DEBUG variables are the most commonly needed.


Settings Files

Caliban loads settings from up to five scopes in precedence order (highest → lowest). See Settings Layering for merge semantics.

ScopemacOSLinuxWindows
Managed (enterprise)/Library/Application Support/Caliban/managed-settings.{toml,json}/etc/caliban/managed-settings.{toml,json}C:\ProgramData\Caliban\managed-settings.{toml,json}
User~/Library/Application Support/caliban/settings.{toml,json}$XDG_CONFIG_HOME/caliban/settings.{toml,json} (default: ~/.config/caliban/)%APPDATA%\caliban\settings.{toml,json}
Project<workspace>/.caliban/settings.{toml,json}<workspace>/.caliban/settings.{toml,json}<workspace>\.caliban\settings.{toml,json}
Local (gitignored)<workspace>/.caliban/settings.local.{toml,json}<workspace>/.caliban/settings.local.{toml,json}<workspace>\.caliban\settings.local.{toml,json}
CLI overlaySupplied via --settings <FILE_OR_JSON>

Both .toml and .json are accepted at each scope. TOML is preferred; JSON is accepted for Claude Code import compatibility.


Sessions

Named sessions are stored as JSON files in the sessions directory.

macOSLinuxWindows
~/Library/Application Support/caliban/sessions/<name>.json$XDG_DATA_HOME/caliban/sessions/<name>.json (default: ~/.local/share/caliban/sessions/)%LOCALAPPDATA%\caliban\sessions\<name>.json

Override with --sessions-dir <DIR>.


Checkpoints

Checkpoints use a content-addressed layout keyed on a SHA-256 hash of the canonicalized workspace path.

macOSLinuxWindows
~/.caliban/projects/<cwd-hash>/checkpoints/<session>/prompt-NNN/~/.caliban/projects/<cwd-hash>/checkpoints/<session>/prompt-NNN/%USERPROFILE%\.caliban\projects\<cwd-hash>\checkpoints\<session>\prompt-NNN\

The <cwd-hash> is the first 16 hex characters of SHA-256(canonicalized_cwd).

Override the root with CALIBAN_CHECKPOINT_ROOT. Disable recording entirely with CALIBAN_CHECKPOINT_DISABLED.


Debug Log

Enabled by --debug or CALIBAN_DEBUG (any non-empty value). Append-only; rotated automatically.

macOSLinuxWindows
~/Library/Caches/caliban/debug.log$XDG_CACHE_HOME/caliban/debug.log (default: ~/.cache/caliban/)%LOCALAPPDATA%\caliban\cache\caliban\debug.log

Audit / Permission-Decision Log

Append-only JSONL log of every permission decision (allow/ask/deny) with tool name, matched rule, and session context. Enabled by default; disable via permissions.audit_log = false in settings.

macOSLinuxWindows
~/Library/Application Support/caliban/permission-decisions.jsonl ¹$XDG_STATE_HOME/caliban/permission-decisions.jsonl (default: ~/.local/state/caliban/)%LOCALAPPDATA%\caliban\permission-decisions.jsonl ¹

¹ macOS and Windows lack a state_dir equivalent; caliban falls back to data_local_dir (~/Library/Application Support/ / %LOCALAPPDATA%).

View with caliban perms audit [--since <ISO>] [--tool <NAME>] [--action <ACTION>] [--head <N>].


Skills

Skills are loaded from several roots, checked in this order:

RootmacOSLinuxWindows
Project<workspace>/.caliban/skills/<workspace>/.caliban/skills/<workspace>\.caliban\skills\
User~/Library/Application Support/caliban/skills/$XDG_CONFIG_HOME/caliban/skills/%APPDATA%\caliban\skills\
Local data~/Library/Application Support/caliban/skills/$XDG_DATA_HOME/caliban/skills/%LOCALAPPDATA%\caliban\skills\
Plugin-contributedVaries per plugin install

Each skill lives in a subdirectory with a SKILL.md file: <root>/<name>/SKILL.md.


Plugins

LocationmacOSLinuxWindows
Project plugins<workspace>/.caliban/plugins/<workspace>/.caliban/plugins/<workspace>\.caliban\plugins\
User plugins~/Library/Application Support/caliban/plugins/$XDG_DATA_HOME/caliban/plugins/ (default: ~/.local/share/caliban/plugins/)%LOCALAPPDATA%\caliban\plugins\
Plugin trust store~/Library/Application Support/caliban/plugin-trust.json~/.local/share/caliban/plugin-trust.json%LOCALAPPDATA%\caliban\plugin-trust.json
Marketplace allowlist~/.caliban/marketplaces-allowlist.json~/.caliban/marketplaces-allowlist.json%USERPROFILE%\.caliban\marketplaces-allowlist.json

MCP Configuration (Legacy)

The legacy mcp.toml is still loaded during the back-compat window:

LocationmacOSLinuxWindows
Project<workspace>/.caliban/mcp.toml<workspace>/.caliban/mcp.toml<workspace>\.caliban\mcp.toml
User~/Library/Application Support/caliban/mcp.toml$XDG_CONFIG_HOME/caliban/mcp.toml%APPDATA%\caliban\mcp.toml

MCP servers are now configured in settings.toml under [mcp_servers]. See MCP Servers.


Hooks Configuration (Legacy)

Legacy hooks.toml files are still loaded during the back-compat window:

LocationmacOSLinuxWindows
Project<workspace>/.caliban/hooks.toml<workspace>/.caliban/hooks.toml<workspace>\.caliban\hooks.toml
User~/Library/Application Support/caliban/hooks.toml$XDG_CONFIG_HOME/caliban/hooks.toml%APPDATA%\caliban\hooks.toml

Hooks are now configured in settings.toml under [hooks]. See Hooks.


Permissions Configuration (Legacy)

LocationmacOSLinuxWindows
Project<workspace>/.caliban/permissions.toml<workspace>/.caliban/permissions.toml<workspace>\.caliban\permissions.toml
User~/Library/Application Support/caliban/permissions.toml$XDG_CONFIG_HOME/caliban/permissions.toml%APPDATA%\caliban\permissions.toml

Permissions are now configured in settings.toml under [permissions]. See Managing Rules.


Model Router Config

LocationmacOS / Linux / Windows
Project<workspace>/caliban.toml (walk-up discovery)
User~/Library/Application Support/caliban/caliban.toml (macOS) / $XDG_CONFIG_HOME/caliban/caliban.toml (Linux)

Override with --config <PATH> or CALIBAN_ROUTER_CONFIG.


Output Styles

LocationmacOSLinuxWindows
Project<workspace>/.caliban/output-styles/<workspace>/.caliban/output-styles/<workspace>\.caliban\output-styles\
User~/Library/Application Support/caliban/output-styles/$XDG_CONFIG_HOME/caliban/output-styles/%APPDATA%\caliban\output-styles\
Plugin-contributedVia plugin data root

Tool-Result Overflow Spill

When a tool result exceeds tool_result_cap_chars, the full result is spilled to disk and the inline message contains a truncated excerpt with a pointer.

macOSLinuxWindows
~/Library/Caches/caliban/tool-overflows/<session-id>/<tool-use-id>.txt$XDG_CACHE_HOME/caliban/tool-overflows/<session-id>/<tool-use-id>.txt%LOCALAPPDATA%\caliban\cache\caliban\tool-overflows\<session-id>\<tool-use-id>.txt

Falls back to /tmp/caliban-tool-overflows/ when the cache directory cannot be determined.


Input History

Per-project input history is stored alongside the checkpoint tree:

All platforms
~/.caliban/projects/<cwd-hash>/input-history.txt

All project histories are accessible via ~/.caliban/projects/ (used by the Ctrl+R all-projects search scope).


Worktrees

Git worktrees managed by caliban are kept inside the repository:

All platforms
<repo-root>/.caliban/worktrees/<name>/

Supervisor / Daemon State

macOSLinuxWindows
~/Library/Application Support/caliban/ (daemon data)$XDG_DATA_HOME/caliban/%LOCALAPPDATA%\caliban\
$XDG_RUNTIME_DIR/caliban/ or ~/Library/Application Support/caliban/run/ (sockets)$XDG_RUNTIME_DIR/caliban/ (sockets)%LOCALAPPDATA%\caliban\run\ (sockets)

Override with CALIBAN_DAEMON_RUNTIME_DIR.


XDG environment variable overrides

On Linux, all $XDG_* variables are honored when set. If unset, the defaults shown above apply. macOS and Windows do not use XDG paths; the dirs crate maps to the platform-native locations shown.

Troubleshooting

This page covers the most common problems operators encounter and how to fix them. Start with caliban doctor — it checks the most likely failure points in one command.


Running caliban doctor

caliban doctor          # quick sanity checks
caliban doctor --deep   # adds provider auth pings (costs one API call per provider)

The output lists each check with a (pass), ! (warning), or (fail) prefix. Warnings such as "no CLAUDE.md found in ancestry" or "no scope files found" are informational; failures indicate something caliban cannot proceed without.

Deep checks cost an inference

--deep issues a real model request to confirm provider auth. Run it when you suspect a key or endpoint problem, not on every invocation.


Provider authentication failures

Symptoms: Error: ANTHROPIC_API_KEY is not set, OPENAI_API_KEY is not set, or similar on startup.

Fixes:

  1. Export the relevant key in your shell:

    export ANTHROPIC_API_KEY=sk-ant-...
    export OPENAI_API_KEY=sk-...
    
  2. Or configure apiKeyHelper in your settings file to fetch credentials dynamically. See Configuring Providers & API Keys.

  3. Run caliban doctor --deep to confirm the key reaches the provider.

Malformed base URL: If you set OPENAI_BASE_URL to a URL that cannot be parsed (e.g. not://a:url), caliban may report a misleading "API key not set" error. Verify the URL is a valid HTTP/HTTPS address before exporting it.


Qwen3 on LM Studio: tool calls leak into reasoning

When running a Qwen3 reasoning model via LM Studio (MLX engine), you may see tool calls appear inside the model's thinking/reasoning channel rather than as structured tool_use blocks. The practical effects:

  • 2-step tool chains (e.g. Glob → Read) usually complete correctly.
  • Chains of 3 or more steps stall: the model re-emits the first tool call across multiple turns and hits --max-turns without progressing.

This is an LM Studio MLX engine limitation, not a caliban defect. The same Qwen3 model on Ollama (GGUF) parses tool calls correctly — the leak does not reproduce there.

LM Studio + Qwen3 reasoning models

Multi-step agentic tasks (3+ tool calls) are unreliable when using Qwen3 reasoning models through LM Studio's MLX path. For agentic work, switch to Ollama or another server that handles Qwen-native <tool_call> XML parsing server-side.

Workarounds:

SituationWorkaround
Need Qwen3 specificallySwitch to Ollama: --provider ollama --model qwen3.5:9b
Must use LM StudioLimit chains to at most 2 tool calls; use --max-turns to prevent runaway loops
Reasoning is optionalUse a non-reasoning Qwen model (e.g. qwen2.5-coder-7b-instruct)

Ollama: tool_call_id not round-tripped

Caliban's Ollama provider does not correlate tool_call_id across the request/response boundary — it is set on the outgoing tool result but is not echoed back by the Ollama server. This is a known limitation of the Ollama API and does not affect tool dispatch correctness in practice.

Note

If you are building a custom consumer of the stream-json output and need to correlate tool_use and tool_result frames, use the id field on the tool_use frame and the tool_use_id field on tool_result as emitted by caliban — they match correctly on the client side regardless of provider.


Parallel sub-agents slow on self-hosted Ollama

If you run parallel sub-agents (AgentTool) against a self-hosted Ollama instance and they are slower than expected, the backend may be serialising requests due to OLLAMA_NUM_PARALLEL=1 (the default on most hardware).

On a NUM_PARALLEL=1 backend, parallel sub-agents do not increase throughput — every inference still queues at the single model slot, and the per-sub-agent overhead (a full reasoning + summary loop per agent) makes total wall time significantly longer than the parent doing the same work inline.

Options:

  • Raise OLLAMA_NUM_PARALLEL on the server if your GPU has enough VRAM for multiple KV-cache allocations.
  • Use --no-sub-agent and let the parent model read files inline.
  • Switch to a hosted provider (Anthropic, OpenAI) where each sub-agent gets independent fleet capacity.
  • Cap dispatch with --parallel-tool-limit N to limit concurrent sub-agent calls.

Sub-agents on a serialising backend

Parallel sub-agents still provide context isolation (each sub-agent gets a fresh context window) even when NUM_PARALLEL=1. That can be worth the wall-time cost for long independent tasks, but not for latency-sensitive pipelines.


Headless Ask→deny remediation

In headless (-p) mode, tools that require user confirmation (the default "Ask" rule) are auto-denied because there is no TTY to prompt on. If a headless run silently fails to write a file or run a command, this is the likely cause.

Fix: add an explicit --allow rule or switch to --auto-allow for unattended runs:

# Allow a specific tool pattern
caliban -p "..." --allow "Write:**"

# Allow all tool calls (use with care)
caliban -p "..." --auto-allow

See Headless & Audit for the full headless permission model and how to configure durable rules.


--debug file logging

Pass --debug (or set CALIBAN_DEBUG=1) to write a detailed event + render log to disk. This is useful when diagnosing silent failures, unexpected tool behaviour, or TUI rendering issues.

Log file locations:

OSPath
macOS~/Library/Caches/caliban/debug.log
Linux / WSL~/.cache/caliban/debug.log

Debug log can be large

The debug log grows quickly under active use. Delete or rotate it after capturing the relevant session. It contains full message content, tool inputs/outputs, and provider requests — do not share it if your prompts contain sensitive information.

The log appends across runs; it is not rotated automatically.

Glossary

Concise definitions for terms used throughout this guide. Each links to the chapter where the concept is covered in depth.


agent harness The runtime that drives the model → tool → model loop: reads user input, calls the provider, dispatches tool calls, feeds results back, and repeats until a terminal condition. Caliban is an agent harness. See What Is Caliban?.

auto-memory Per-project notes written by the model itself into a designated memory file. Injected into the system prompt on subsequent sessions. See Auto-Memory.

checkpoint A snapshot of the conversation state (messages + file-tool pre-images) taken before each prompt. Used by /rewind to restore a prior state. See Checkpoints & Rewind.

compaction The process of summarising or truncating conversation history when the context window approaches its limit, allowing the session to continue. See Context & Compaction.

headless / print mode Non-interactive operation via -p / --print. Caliban drives the agent without a TUI and emits text or structured JSON output to stdout. See Print Mode and The stream-json Protocol.

hook An event-driven callback executed by an external command, HTTP endpoint, MCP tool, or in-process handler at defined points in the agent lifecycle (e.g. before_tool, SessionStart). See Hooks.

MCP server A Model Context Protocol server that exposes additional tools to caliban over stdio, HTTP/SSE, or streamable-HTTP transports. Caliban discovers and manages MCP servers via its settings. See MCP Servers.

memory tier One of the three layers of context prepended to the system prompt: global (~/.claude/CLAUDE.md), project (<workspace>/CLAUDE.md), and auto-memory (model-written notes). See Memory Tiers.

message IR The provider-neutral internal representation of conversation messages used by caliban-common. All providers translate to and from this IR so the agent core stays provider-agnostic. See Architecture & ADRs (ADR 0006).

output style A named instruction set (Default, Proactive, Explanatory, Learning, or custom) that shapes how the model formats and explains its responses. See Output Styles.

permission mode A named preset that sets the default disposition for tool-call permission checks. Modes include default, acceptEdits, plan, auto, dontAsk, and bypassPermissions. See Permission Modes.

plugin A self-contained bundle of skills, hooks, agents, MCP server configs, and output styles distributed as a directory with a plugin.json manifest. See Plugins.

provider An adapter that translates caliban's message IR to and from a specific model API (Anthropic, OpenAI, Ollama, Google, Bedrock, Vertex). See Supported Providers.

router The caliban-model-router layer that selects a provider+model for each request based on configured rules, purpose keys, fallback chains, circuit breakers, and capability requirements. See The Model Router.

sandbox An OS-level confinement layer (macOS Seatbelt or Linux bubblewrap) applied to shell and file tools to restrict what they can access on the host. See The OS Sandbox.

session A persisted conversation: a named JSON file on disk containing the full message history for a continuous exchange. See Sessions & Persistence.

skill A markdown file with YAML frontmatter that the model can invoke as a tool. Skills encapsulate reusable workflows without requiring code. See Skills.

sub-agent A nested caliban instance spawned by the parent agent to execute a delegated task, optionally in an isolated git worktree. See Sub-agents.

tool A capability the model can invoke during a turn — built-in tools include Read, Write, Bash, Glob, Grep, Edit, WebSearch, and AgentTool. See Built-in Tools and Tool Execution.

Parity vs Claude Code

Caliban tracks feature parity with Claude Code in a living matrix. This page summarises the current state by theme. The full matrix — including per-row notes and ADR cross-references — lives at docs/parity-gap-matrix.md in the repository.

Legend: ✅ parity · 🟡 partial · 🔴 not yet


Theme summary

A — Permissions & safety ✅

Rule grammar (allow/ask/deny + globs), all six permission modes, the auto-mode classifier, the TUI Ask modal, OS-level sandbox (macOS Seatbelt + Linux bubblewrap), and the full caliban perms CLI with TOML writeback and audit log are all shipped. See ADRs 0020, 0029, 0032, and 0045.

B — Hooks & extensibility ✅

All hook event types (tool, session, compact, config, cwd, file, subagent, permission), hook decision protocol, and plugin packaging are shipped. The mcp/prompt/agent handler types are v1 stubs; per-subagent hook inheritance lands with the fleet spec.

C — Memory & checkpointing ✅

Three-tier prompt prefix, CLAUDE.md ancestor walk + @-imports, auto-memory, claudeMdExcludes, auto-checkpoint per prompt, /rewind, MicroCompact janitor, and tool-result size cap with overflow persistence are all shipped.

D — Configuration / settings ✅

Layered settings (managed > user > project > local), /config interactive editor, live reload, apiKeyHelper pool, and schema validation are shipped (ADR 0026 + 0045). TOML is the primary write format; JSON is accepted on read.

E — TUI ergonomics 🟡

Status bar, mouse scroll, transcript viewer, @file attach, ! shell escape, external editor (Ctrl+G), Ctrl+O transcript dump, background bash (Ctrl+B), image/vision input, permission Ask modal, and reverse history search are shipped. Notable gaps: vim editing mode (🔴), slash-menu typeahead (🟡 partial), multi-line input (🟡 partial), and voice dictation (🔴).

F — Built-in tools ✅

Bash, Edit, Glob, Grep, Read, Write, WebFetch, TodoWrite, Skill, AgentTool, NotebookEdit, MultiEdit, WebSearch, and background-bash are shipped. PowerShell tool and ToolSearch / WaitForMcpServers (relevant once MCP is fully real) are 🔴.

G — Sub-agents ✅

In-process AgentTool, git worktree isolation, background agent fleet (caliband daemon), per-agent memory dir, hook inheritance, and supervisor daemon are all shipped (ADR 0037).

H — MCP ✅

Config validation, real spawn/handshake, stdio + HTTP/SSE + streamable-HTTP transports, per-server permission scoping, /mcp slash, OAuth PKCE flow, elicitation, and resource references are shipped (ADR 0023).

I — Model router & providers ✅

Purpose-keyed routing, fallback chains, hedging, circuit breakers, capability filtering, Anthropic/OpenAI/Ollama/Google/Bedrock/Vertex providers, and effort levels are shipped. Azure Foundry is 🔴; extended-thinking toggle is 🟡 partial.

J — Headless / CI ✅

-p / --print mode, all output formats (text/json/stream-json), input formats, --max-turns, --max-budget-usd, --bare, --json-schema, --include-partial-messages, and --include-hook-events are shipped. GitHub Actions workflow and devcontainer feature are 🔴 (separate sub-projects).

K — Observability / cost ✅

tracing instrumentation, /context, /usage, /compact, proactive autocompact, prompt cache markers, cost tracking, OpenTelemetry export, and the custom status line are shipped. --debug / --debug-file is 🟡 partial. The feedback survey is 🔴.

L — Output styles ✅

All four built-in output styles (Default, Proactive, Explanatory, Learning) and custom output-style files are shipped (ADR 0031).

M — Slash command coverage 🟡

Core commands (/plan, /memory, /skills, /quit, /clear, /help, /init, /context, /usage, /compact, /config, /hooks, /mcp, /model, /effort, /resume, /cost, /export, /rewind, /doctor, /login, /logout, /status) are shipped. Theme customisation and skill-dependent commands (/code-review, /run, /verify, /batch) are 🔴.

N — Long-tail surfaces 🔴

IDE extensions (VS Code / Cursor / JetBrains), GitHub App, claude.ai/code web, iOS app, Slack, Remote Control, Channels, Routines, Deep links, and Teleport are all 🔴. These are parked until terminal/CLI parity is reached.


Notable gaps

GapStatusNotes
Vim editing mode🔴TUI input layer
Azure Foundry provider🔴Provider adapter not yet written
GitHub Actions workflow🔴Separate sub-project
Devcontainer feature🔴Separate sub-project
ToolSearch / WaitForMcpServers🔴Only relevant once MCP is fully real
Skill-dependent slash commands🔴/code-review, /run, /verify, /batch
Cloud / IDE / mobile surfaces (N)🔴All large investments; deferred

Note

The parity matrix is refreshed in the same PR that ships each feature. If a row above contradicts what you see in the matrix file, the matrix file is authoritative.

Crate Map

The caliban workspace is organised into ~24 crates across four main layers. This page gives an operator-facing orientation — enough to know which crate to look at when reading a log line, error message, or ADR. For architecture rationale, see Architecture & ADRs.

Note

This map is for the curious. You do not need to know these crates to use caliban — they are implementation details that surface only in debug logs, error messages, and ADR references.


Layer 1 — Foundation

Shared types, abstractions, and utilities that every other layer depends on.

CratePurpose
caliban-commonProvider-neutral message IR, shared error types, and cross-crate utilities
caliban-settingsUnified settings hierarchy (managed > user > project > local); file loading, schema validation, live reload, apiKeyHelper pool

Layer 2 — Providers

One adapter per model API. Each translates caliban's message IR to the provider's wire format and back.

CratePurpose
caliban-providerProvider trait definition and shared provider types
caliban-provider-anthropicAnthropic (Claude) adapter via Anthropic Messages API
caliban-provider-openaiOpenAI adapter; also used for LM Studio, vLLM, and other OpenAI-compatible servers
caliban-provider-ollamaOllama adapter (native /api/chat endpoint, GGUF tool-call parsing)
caliban-provider-googleGoogle AI Studio / Gemini adapter
caliban-provider-bedrockAWS Bedrock adapter (ADR 0034)
caliban-provider-vertexGoogle Cloud Vertex AI adapter (ADR 0034)
caliban-model-routerPurpose-keyed routing, fallback chains, hedging, circuit breakers, capability filtering (ADR 0022, 0038)

Layer 3 — Agent Core

The runtime that drives the model → tool → model loop.

CratePurpose
caliban-agent-coreAgent loop, turn handling, compaction strategies, permission dispatch, sub-agent orchestration
caliban-tools-builtinBuilt-in tools: Read, Write, Edit, Bash, Glob, Grep, WebFetch, TodoWrite, AgentTool, NotebookEdit, and others
caliban-sandboxOS-level tool confinement (macOS Seatbelt, Linux bubblewrap) (ADR 0032)
caliban-skillsSkill discovery, frontmatter parsing, and SkillTool invocation (ADR 0019)
caliban-mcp-clientMCP server lifecycle: spawn, handshake, list_tools, transports, OAuth (ADR 0017, 0023)
caliban-pluginsPlugin package management: manifest parsing, trust gating, namespace expansion (ADR 0030)
caliban-imagesImage / vision input: clipboard, @path, drag-and-drop, provider wire shapes (ADR 0039)

Layer 4 — Sessions, State & Infrastructure

Persistence, memory, observability, and the background fleet.

CratePurpose
caliban-sessionsSession persistence (JSON on disk), load/save, session directory management
caliban-checkpointPer-prompt checkpoint snapshots and /rewind restoration (ADR 0028)
caliban-memoryThree-tier memory (global/project/auto-memory), CLAUDE.md ancestor walk and @-imports (ADR 0018, 0035, 0036)
caliban-output-stylesBuilt-in and custom output style loading and activation (ADR 0031)
caliban-telemetryOpenTelemetry export, cost accounting, metric emission (ADR 0033)
caliban-worktreesGit worktree creation and lifecycle management for sub-agent isolation (ADR 0037)
caliban-supervisorBackground agent fleet and caliband supervisor daemon (ADR 0037, 0042)

The binary

CratePurpose
calibanThe caliban binary: CLI parsing (args.rs), startup pipeline, TUI (ratatui), headless dispatch, and subcommand handlers

Architecture & ADRs

Caliban captures every significant architectural decision in an Architecture Decision Record (ADR). Each ADR states the context, the decision, and its consequences — giving contributors (and curious operators) the rationale behind the design, not just the outcome.

ADRs live in the docs/adr/ directory of the repository. They use a lightweight MADR-lite format and carry a status:

  • accepted — currently in effect
  • superseded — replaced by a later ADR; kept for history
  • proposed — under discussion, not yet in effect
  • rejected — considered and explicitly declined

This is the contributor/internals layer

You do not need to read ADRs to use caliban. They exist for contributors and operators who want to understand why something works the way it does. For crate orientation, see Crate Map.


ADR index

Foundation

#TitleStatus
0000Record architecture decisions (MADR-lite under docs/adr/)accepted
0001Async runtime → tokioaccepted
0002Error model → thiserror for libs, anyhow for binaryaccepted
0003License → AGPL-3.0-onlyaccepted
0004Naming → caliban-* libraries, caliban binaryaccepted
0005Workspace layout → crates/ for libs, binaries at rootaccepted

Provider & message model

#TitleStatus
0006Message schema → provider-neutral IRaccepted
0007Schema/transport factoring via Transport traitaccepted
0008Role::System is positional (leading-only)accepted

Agent core

#TitleStatus
0009Agent-core design (stream-as-primitive, sequential tools, opt-in compaction)accepted (sequential-tools clause superseded by 0016)
0010WorkspaceRoot path resolution + opt-in restricted modeaccepted
0016Parallel tool dispatch (semaphore-bounded; supersedes 0009 sequential clause)accepted
0021Sub-agent primitive (AgentTool; synchronous in-process; allowlist-filtered registry)accepted

TUI & sessions

#TitleStatus
0011Sessions persisted to disk + interactive REPLaccepted
0012TUI via ratatui (replacing the rustyline REPL)accepted
0013TUI overlays + layout v2accepted
0014Default system prompt + TUI stall fixes + debug loggingaccepted
0015Context preservation + path conventions (~ expansion)accepted
0027TUI ergonomics (@file, !, Ctrl+G, Ask modal, transcript viewer)accepted
0041TUI redraw tick — close-out (resolves 0014 open question)accepted

Memory & checkpointing

#TitleStatus
0018Memory tier model (global / project / auto-memory; spliced into system prompt)accepted
0028Auto-checkpointing + /rewindaccepted
0035Auto-memory (model-written notes per project)accepted
0036CLAUDE.md ancestor walk + @-importsaccepted

Permissions & safety

#TitleStatus
0020Permission rules layered on Hooks (TOML rule sources; interactive Ask)accepted
0029Permission modes (acceptEdits / auto / dontAsk / bypassPermissions) + auto-mode classifieraccepted
0032OS-level sandbox (macOS Seatbelt + Linux bubblewrap)accepted
0045Permissions v2 — TOML-primary config + richer rule schemaaccepted

Configuration & settings

#TitleStatus
0026Unified settings hierarchy (managed > user > project > local)accepted
0043arc-swap as the read-mostly shared-state primitiveaccepted

Extensibility: hooks, skills, plugins, output styles

#TitleStatus
0019Skills loading & invocation (frontmatter + body; SkillTool on-demand load)accepted
0024Hook event taxonomy (expanded events + handler types)accepted
0030Plugin packaging (skills + hooks + agents + MCP + output-styles bundles)accepted
0031Output styles (Default / Proactive / Explanatory / Learning + custom)accepted
0040Slash command registry (extensible SlashCommand trait)accepted

MCP

#TitleStatus
0017MCP client architecture (stdio v1; tools surface as mcp__<server>__<tool>)accepted
0023MCP v2 — transports, OAuth, elicitation, resourcesaccepted
0044rmcp 1.7 version pin (dedicated-PR bumps)accepted
0046Two-stage tool surface — lazy MCP schema loading + ToolSearchaccepted

Model router & providers

#TitleStatus
0022Model routing architecture (Layer 3 caliban-model-router; router-impl-Provider)accepted
0034Bedrock + Vertex providersaccepted
0038Model router v2 (fallback / hedging / circuit breakers / capability filtering)accepted
0039Image / vision inputaccepted

Headless / CI & observability

#TitleStatus
0025Headless / print mode + JSON output protocolaccepted
0033OpenTelemetry export + cost accountingaccepted

Sub-agents & background fleet

#TitleStatus
0037Sub-agent worktree isolation + background fleetaccepted
0042caliband sibling-binary placement (under caliban-supervisor)accepted

Architecture Decision Records

ADR 0000 · Record architecture decisions

  • Status: accepted
  • Date: 2026-06-14

Context

caliban has kept Architecture Decision Records since the Layer-0 bootstrap (ADRs 0001–0047). The original 2026-05-22 Layer-0 bootstrap design placed them at the repository root in adrs/, reasoning that ADRs are first-class Layer-0 deliverables and top-level placement makes them impossible to miss.

Since then the sibling repositories adopted the conventional adr-tools / MADR layout instead: prospero and gonzalo both keep their records under docs/adr/, seed the log with a meta "record architecture decisions" entry, and (prospero) ship a template.md. caliban was the outlier — root adrs/ (plural), no meta record, no template — which created cross-repo confusion and path mismatches for anyone moving between the three repos.

There was no ADR stating why caliban records decisions or where they live; that rationale lived only in a feature design doc, which is exactly the kind of external dependency ADRs are supposed to avoid.

Decision

We will keep Architecture Decision Records under docs/adr/, in MADR-lite format (a lightweight extension of Michael Nygard's original ADR style), matching sibling repos prospero and gonzalo. Specifically:

  • Location: docs/adr/ (singular adr), not the former root adrs/. This supersedes the root-placement decision in the Layer-0 bootstrap design; existing records were relocated with git mv to preserve history.
  • This meta record is numbered 0000 so the existing 00010047 numbering is preserved — no renumbering churn, and the log still opens with a record of the practice itself.
  • Format: each ADR is one append-only file NNNN-kebab-title.md with Context, Decision, and Consequences. A decision is changed by writing a new ADR that supersedes the old one, never by rewriting history.
  • Template: new ADRs start from template.md.
  • Status legend: accepted / superseded / proposed / rejected, indexed in README.md.

Consequences

  • Positive: one consistent ADR convention across caliban / gonzalo / prospero; the conventional, tooling-friendly docs/adr/ location; and the rationale for the practice now lives in an ADR rather than a feature design doc, so it is self-sustaining.
  • Negative: a one-time churn to relocate the directory and update every inbound reference (crate rustdoc, README, the mdBook guide, the parity matrix, and the historical design docs). ADRs no longer sit at the repo root, so they are slightly less discoverable from a bare ls — mitigated by a pointer from the top-level README.md.
  • Revisit if: the agreed cross-sibling ADR standard changes, or the docs/adr/ layout proves harder to maintain than the root placement it replaced.

ADR 0001 · Async runtime → tokio

  • Status: accepted
  • Date: 2026-05-22

Context

caliban's foundation is heavily I/O-bound: provider HTTPS calls, streaming responses from LLM endpoints, MCP transports, and eventually a multi-session orchestrator. Rust's async story is fragmented across runtimes (tokio, async-std, smol, embassy), and futures from one runtime cannot always be polled by another. Picking a runtime up front prevents subtle cross-runtime breakage as the workspace grows.

Decision

Standardize on tokio (multi-threaded scheduler, features = ["full"]) across every crate in the workspace. The workspace root pins the version in [workspace.dependencies]; member crates declare tokio.workspace = true and may select their own feature subset.

No nested runtimes. Each binary creates a single tokio::runtime::Runtime (or uses #[tokio::main]) for its entire lifetime.

Consequences

  • Positive: direct compatibility with reqwest, tower, hyper, axum, tonic, every major MCP transport, and most LLM SDKs. Predictable async behavior across the workspace. Easy onboarding — tokio is the de facto Rust async runtime.
  • Negative: locks the workspace out of smol/embassy ecosystems (acceptable — no embedded targets planned). Binary size larger than a minimal runtime would produce.
  • Revisit if: caliban needs to run in a no_std or embassy-only environment, or if a critical dependency requires a different runtime.

ADR 0002 · Error model → thiserror for libraries, anyhow for binary

  • Status: accepted
  • Date: 2026-05-22

Context

Rust libraries benefit from precise error enums — consumers want to match on variants and react differently to different failure modes. Binaries benefit from ergonomic context propagation — operators want a readable error chain showing where things went wrong, not pattern- matching on every variant.

A shared "uber error" crate that every other crate depends on creates a foundation-coupling crate and forces every error change to ripple through the workspace. We want errors to be local.

Decision

Every caliban-* library crate defines its own Error enum using thiserror, and exposes:

#![allow(unused)]
fn main() {
pub type Result<T> = std::result::Result<T, Error>;
}

Cross-crate errors convert at boundaries with #[from] or explicit From impls. No shared error crate.

The caliban binary will use anyhow::Result in main() and top-level command handlers once real command logic exists. ? propagates errors with context using .context("...") from anyhow::Context.

At Layer 0 the binary is an argv-only stub returning std::process::ExitCode directly (so it can distinguish exit codes 0 / 2 for success vs. misuse); anyhow is declared as a workspace-inherited dependency and will be imported as soon as the first error-propagating command lands.

Consequences

  • Positive: adding a new error variant is local to one crate. Library consumers can match precisely; binary code gets readable context. No god-error-crate.
  • Negative: slight boilerplate per library (the Error enum and Result alias). From impls must be added at boundaries.
  • Revisit if: a real shared error type emerges (e.g., a cross-crate "Cancelled" or "Timeout" that every layer must surface identically).

ADR 0003 · License → AGPL-3.0-only

  • Status: accepted
  • Date: 2026-05-22

Context

caliban is private now but designed to be open-sourced. The author explicitly rejects permissive defaults (MIT, Apache-2.0): the goal is to enforce community contribution from downstream users and hosted- service providers, not maximize commercial adoption.

The relevant tiers of copyleft are:

  • GPL-3.0 — strong copyleft on distribution; SaaS providers can modify and host without releasing source (the "SaaS loophole").
  • AGPL-3.0 — closes the SaaS loophole: hosting modified code as a network service triggers the obligation to release source.
  • SSPL — stronger than AGPL but not OSI-recognized as open source.
  • MPL-2.0 — file-level (weak) copyleft; consumers don't have to copyleft their downstream code.

Decision

Every crate's Cargo.toml declares license = "AGPL-3.0-only" via license.workspace = true. The full AGPL-3.0 text lives in LICENSE at the workspace root. The README states the license prominently and explains the implications for service operators and forks.

Consequences

  • Positive: forks and hosted services must release modifications. Aligns with Mastodon, Nextcloud, Gitea, and Sourcehut — all of which have used AGPL successfully to balance openness with sustainable community contribution. Author's stated philosophy is enforced.
  • Negative: caliban crates won't compose into permissive Rust projects on crates.io — depending on caliban-* makes the consumer AGPL. This is intentional: caliban is an end product, not a general-purpose library to be embedded.
  • Revisit if: the AGPL is preventing a legitimate non-commercial use case the author wants to support. A future ADR could carve out exceptions or dual-license specific crates.

ADR 0004 · Naming → caliban-* libraries, caliban binary

  • Status: accepted
  • Date: 2026-05-22

Context

Crate names on crates.io are globally unique. If we eventually publish, we need names that aren't already taken and that signal ownership. Within the workspace, naming conventions also affect ergonomics — module paths, import statements, and clippy's module_name_repetitions lint all interact with crate names.

Decision

  • Library crates use the caliban- prefix: caliban-core, caliban-provider, caliban-agent-core, etc. Directory name matches the package name.
  • Binary crate is named caliban. Its package name is caliban, so Cargo's default binary name matches; caliban/Cargo.toml makes this explicit with a [[bin]] name = "caliban" entry for clarity.
  • Internal module paths drop the prefix where it would be redundant: caliban_provider::ProviderClient, NOT caliban_provider::CalibanProviderClient.
  • Clippy's module_name_repetitions lint is allowed at the workspace level to support the internal-naming convention without fighting clippy on every type.

Consequences

  • Positive: all caliban crates can be reserved on crates.io ahead of public release. cargo install caliban works once published. Internal type names stay terse.
  • Negative: ~9 extra characters of typing per crate reference in Cargo.toml dependency lists. Slight redundancy in long import paths (caliban_core::caliban_core_specific::... — avoided in practice by short module names).
  • Revisit if: the workspace gains so many crates that the prefix becomes overhead, or if a sub-org / sub-product emerges that warrants its own prefix.

ADR 0005 · Workspace layout → crates/ for libraries, binaries at root

  • Status: accepted
  • Date: 2026-05-22

Context

The workspace is planned to grow to ~11 crates across 4 layers (foundation, integration, routing, UX surfaces). Layout patterns seen in Rust workspaces:

  • Flat (crate1/, crate2/ at root) — used by tokio, serde, axum. Simpler for small workspaces, clutters root past ~8 crates.
  • All-in-crates/ — used by ruff. Binary and libraries intermingled; clean root but binary entry points are buried.
  • Apps/libs split (crates/ for libs, apps/ for bins) — principled but less common; over-engineered for our size.
  • Binaries at root, libraries in crates/ — used by deno (with cli/), zed, helix. Entry points are top-level visible; libraries are clearly cataloged.

Decision

Adopt the last pattern: library crates under crates/caliban-<name>/, binary crates as first-class subdirectories of the workspace root (caliban/, future caliban-tui/, caliban-orchestrator/) rather than nested under a shared parent directory. Workspace members are listed explicitly in root Cargo.toml, no globs.

Consequences

  • Positive: root-level ls reveals entry points (binaries) and config files. crates/ reveals reusable libraries. Explicit member list catches typos and missing members at workspace-parse time.
  • Negative: new-crate workflow has two patterns rather than one (cargo new --lib crates/<name> for libraries, cargo new <name> at root for binaries). Documented in README.
  • Revisit if: the workspace stays small (<5 crates) and the crates/ directory feels like overhead, or grows past ~25 crates where a flat-but-grouped layout (e.g. crates/layer-1/, crates/layer-2/) becomes warranted.

ADR 0006 · Message schema → provider-neutral IR

  • Status: accepted
  • Date: 2026-05-22

Context

Layer 0 deferred the choice of message schema. Three approaches considered: (1) Anthropic-shape canonical; (2) provider-neutral IR; (3) lowest-common-denominator.

Decision

Define caliban's own Message/Content/StreamEvent types (the IR) in caliban-provider. Each adapter translates provider_native ↔ IR at its boundary. The IR is intentionally close to Anthropic's API shape because Anthropic's API is the most expressive of the supported providers; other adapters lose less information when mapping to the IR.

Consequences

  • Positive: Adding a new provider doesn't touch caliban-provider. Provider-specific API changes don't ripple. The model-router (Layer 3) operates uniformly on IR. All transport variants of a given schema family share IR conversion code.
  • Negative: One extra translation hop per request. IR design must capture the union of advanced features (thinking, prompt caching, multimodal) without becoming Anthropic-in-disguise.
  • Revisit if: A provider emerges with feature semantics that can't be cleanly expressed in the IR (e.g., a new content modality the union doesn't anticipate).

ADR 0007 · Schema/transport factoring via Transport trait

  • Status: accepted
  • Date: 2026-05-22

Context

A naïve "one crate per concrete provider endpoint" plan duplicates the Anthropic Claude schema work across caliban-provider-anthropic (direct API), an eventual Bedrock-Claude crate, and an eventual Vertex-Claude crate. Two orthogonal dimensions exist: model schema family vs. transport/endpoint.

Decision

Each schema-family crate (caliban-provider-anthropic, caliban-provider-openai, caliban-provider-google, caliban-provider-ollama) defines its own Transport trait. A schema-family-generic XxxProvider<T: Transport> owns the IR conversion. Transport variants (DirectTransport, BedrockTransport, VertexTransport, AzureTransport, AIStudioTransport) are concrete Transport impls within their schema family, gated behind cargo features when they pull heavy deps (aws-sdk-bedrockruntime, gcp_auth).

Consequences

  • Positive: Claude-on-Bedrock and Claude-on-Vertex reuse the Anthropic IR-conversion code. Adding a new transport for an existing schema is a single-file change. The model-router can treat (schema_family, transport) as a tuple.
  • Negative: A Transport trait is per-family, not shared across families — caliban-provider-anthropic::Transport ≠ caliban-provider-openai::Transport. This is intentional (transport contracts are not interchangeable across schemas).
  • Revisit if: A transport pattern emerges that genuinely cross-cuts schema families (e.g., a future caliban-side mTLS proxy that wraps any provider).

ADR 0008 · Role::System messages are positional (leading-only)

  • Status: accepted
  • Date: 2026-05-22

Context

OpenAI's API treats system as a role: system messages can appear anywhere in the messages array. Anthropic's, Gemini's, and Bedrock-Claude's APIs treat the system prompt as a separate top-level field. Modeling both shapes uniformly in the IR was an open question.

Decision

The IR has three roles: User, Assistant, System. System messages must appear contiguously at the start of CompletionRequest.messages. Validation rejects out-of-order System messages and System messages containing non-Text content blocks. Adapters with a separate-field system model (Anthropic, Gemini) collect the leading System messages and serialize them into the dedicated field; adapters with a system-role model (OpenAI, Ollama) pass them through as-is.

Consequences

  • Positive: Single canonical representation. Maps cleanly to all four families. Per-System-message cache_control (Anthropic feature) is preserved by serializing the system field as a block array when any block has a cache marker.
  • Negative: Disallows the rare pattern of mid-conversation system injection. Callers wanting that pattern must rewrite into a "User says: here's a new constraint…" style.
  • Revisit if: A provider semantically requires non-leading system messages, or a credible agent design needs mid-conversation system injection.

ADR 0009 · Agent-core design (stream-as-primitive, sequential tools, opt-in compaction)

  • Status: accepted
  • Date: 2026-05-23

Context

Layer 1 / C adds the agent loop. Three design dimensions had real trade-offs: where the streaming surface lives, whether tool calls in one response are dispatched concurrently or sequentially, and what the default compaction strategy is.

Decision

  • stream_until_done is the single source of truth. Non-streaming run_turn and run_until_done are thin consumers of the stream. This means the streaming code path is always exercised; bugs surface through unit + integration tests of either surface.
  • Tool calls are dispatched sequentially within a single turn. Anthropic and Gemini can emit multiple tool_use blocks in one response; we run them in the order received. Parallelism is a follow-on (Hooks-pluggable dispatch strategy).
  • Default compactor is NoopCompactor. Compaction strategies (DropOldest, Summarizing) are explicit opt-ins. The library doesn't silently mutate the user's message history; callers decide.
  • Retries only on the provider call. Tool failures don't retry — tools manage their own retry semantics. Retryable provider errors: RateLimit, Network, ServerError 502-599. NOT retryable: Auth, InvalidRequest, ContextTooLong, ContentFilter, Cancelled, Adapter, ModelUnavailable, ServerError 500.

Consequences

  • Positive: Single source of truth → simpler correctness story. Sequential tool dispatch → predictable behavior, easier debugging. Opt-in compaction → no surprise history mutation. Retry policy classifier is conservative and stable.
  • Negative: Sequential dispatch is slower than parallel for independent tools. Token-counting heuristic (chars/4) is approximate.
  • Revisit if: Real workloads show sequential dispatch as a bottleneck (add parallel strategy); a non-English language is consistently mis-estimated (integrate a tokenizer crate).

ADR 0010 · WorkspaceRoot path resolution + opt-in restricted mode

  • Status: accepted
  • Date: 2026-05-23

Context

caliban's built-in tools (Read/Write/Edit/Bash/Glob/Grep) accept paths from model-generated tool calls. Two extremes for path handling are both wrong: (a) reject all absolute paths — breaks legitimate use cases like reading /etc/hostname for diagnostics; (b) accept any path unconditionally — lets a model accidentally read or overwrite arbitrary files.

Decision

Tools share a WorkspaceRoot type that resolves relative paths against a canonical root directory. Two modes:

  • Permissive (default): Relative paths resolve under the root. Absolute paths are accepted as-is.
  • Restricted (opt-in via .restricted()): Resolved paths must start with the canonical root after canonicalization. Path traversal via .. is normalized away before the prefix check, so escape attempts (../escape) are rejected with ToolError::InvalidInput.

The CLI surface (Layer 4) chooses the mode; the default is permissive because caliban runs with the operator's permissions in their own environment. Restricted mode is intended for sandboxed-agent scenarios (future: agent-as-service, untrusted-task delegation).

Consequences

  • Positive: Single shared resolver across all six tools; no per-tool path-handling logic. Restricted mode provides a meaningful safety boundary when needed. .. traversal attacks are defeated by canonicalize-then-prefix-check.
  • Negative: Permissive default means the model can read/write anywhere the harness process can. Acceptable for the personal-use context; documented as such.
  • Revisit if: caliban gains a "delegated agent" mode where one caliban instance runs sub-tasks on behalf of another, requiring per-task sandboxing.

ADR 0011 · Sessions persisted to disk + interactive REPL

  • Status: accepted
  • Date: 2026-05-23

Context

caliban's MVP was single-shot — every invocation started a fresh conversation with no memory of previous runs. For real daily use, two things matter: (a) being able to resume a conversation across invocations, and (b) having an interactive prompt for iterative work without re-invoking the binary each turn.

Decision

Sessions: a PersistedSession (name, provider, model, messages, total_usage, timestamps) saved as pretty-printed JSON under $XDG_DATA_HOME/caliban/sessions/<name>.json (default ~/.local/share/caliban/sessions/). Names validated against [a-zA-Z0-9_-]+ with length 1..=64 to prevent path traversal and platform-incompatible names. Atomic writes via tempfile::NamedTempFile::persist so crashes mid-save can't corrupt the file.

REPL: caliban with no prompt + TTY stdin enters an interactive loop using rustyline for line editing + history persistence at ~/.local/share/caliban/repl_history.txt. Slash commands (/help, /exit, /quit, /clear, /sessions, /save, /usage) provide session-management without exiting. When entered with --session, the REPL auto-saves after every turn.

JSON over SQLite: chosen for transparency. Users can cat/edit/ diff session files; debugging is easy; no migrations. Tradeoff: O(n) list and slower large-history loads, but until sessions exceed thousands of turns this is irrelevant.

Consequences

  • Positive: zero-friction resume of any past conversation. Sessions are inspectable / editable / git-trackable if a user wants. REPL gives an interactive UX without committing to a TUI.
  • Negative: rustyline adds non-trivial dependencies. Concurrent writes to the same session (two caliban processes) → last-write-wins (documented; out of scope for a single-user MVP).
  • Revisit if: session files grow large enough that JSON parse time is noticeable, or users want simultaneous multi-process access. Migration to SQLite would be straightforward — the SessionStore API is the abstraction boundary.

ADR 0012 · TUI via ratatui (replacing the rustyline REPL)

  • Status: accepted
  • Date: 2026-05-23

Context

caliban's first interactive mode was a rustyline-based REPL: a line editor with history and slash commands. It worked, but felt like a shell rather than a proper agent UI. The user asked for a Claude Code-like experience: dedicated input area, persistent status bar showing context (cwd, model, session), scrolling conversation transcript above.

Decision

Replace the rustyline REPL with a ratatui + crossterm-based TUI. Three-region vertical layout:

  1. Output region — flex-grow; renders the conversation transcript via Paragraph with wrap. Auto-scrolls to the bottom; PageUp/Down for history.
  2. Status bar — fixed 1 line; shows cwd · provider model · session (turns) · running….
  3. Input area — fixed 2 lines (border + line); plain text input with cursor + line editing + arrow-key history.

The event loop multiplexes terminal events (crossterm EventStream) and agent stream events via tokio::select!. std::future::pending() keeps the agent arm dormant when no turn is running.

Raw mode + alternate screen entered via a TerminalGuard RAII type that restores terminal state on Drop (including panic-recovery).

Consequences

  • Positive: Looks and feels like a modern agent CLI. Status bar gives immediate context (which session, which model, which dir). Streaming output renders in real-time above the prompt without interfering with input. ratatui handles terminal resize automatically.
  • Negative: Significantly more code (~400 lines vs. rustyline's ~250). ratatui + crossterm add non-trivial deps. Markdown rendering, mouse support, and customizable themes are deferred. Non-TTY invocation without a prompt is now an error (use --prompt or pipe via -).
  • Revisit if: users want mouse interaction, syntax-highlighted code blocks in responses, or split-pane layouts (e.g., a side panel showing recent tool calls). Each would be a focused follow-on.

ADR 0013 · TUI overlays + layout v2 (input bracketed by horizontal rules)

  • Status: accepted
  • Date: 2026-05-23

Context

The first TUI iteration shipped a working three-region layout (output | status | input) and slash commands that wrote to the transcript. As the slash-command list grew (help, config, mcp, skills) the transcript became a cluttered place to render reference information.

Decision

  1. Layout v2 reorders the regions so the input area sits between the output region and the status bar, bracketed by single-row horizontal rules. This puts the active input visually closer to the bottom (where the user's hands rest) and matches the Claude Code layout the user requested.

  2. Overlays are modal popups rendered centered (80% × 80%) over the main view via ratatui's Clear + bordered Block + Paragraph widgets. ViewState::Overlay(Overlay) on App tracks which overlay is active; Esc or q resets to ViewState::Main. Main- view key handling is suppressed while an overlay is open (the overlay is read-only in v1).

  3. Four sub-menus: /help (slash command + key reference), /config (active configuration from app.args/app.session), /mcp (stub pointing at future caliban-mcp-client), /skills (stub pointing at future caliban-skills).

Consequences

  • Positive: Reference views don't pollute the transcript. The layout is closer to Claude Code's. The /config view is genuinely useful for verifying caliban's state at a glance. The /mcp and /skills stubs document the future direction in the UI itself.
  • Negative: Two more enum variants per addition; overlay content is static for now and must be hand-edited when slash commands evolve. Editing config from the UI is deferred.
  • Revisit if: A keyboard-driven command palette (Ctrl+P-style) is desired; if the slash-command list grows beyond ~12 entries and needs categorization; if /config gains edit capability (toggling bools, changing model mid-session) requiring stateful focus tracking.

ADR 0014 · Default system prompt + TUI stall fixes + debug logging

  • Status: accepted
  • Date: 2026-05-23

Context

Real-use testing revealed two issues with the daily-usable caliban:

  1. No default system prompt — models had no context that they were running in caliban, what tools were available, or which directory they were operating in. Behavior was generic-assistant rather than harness-aware.

  2. Occasional streaming stalls — the TUI's event-loop draws once per tokio::select! iteration. Sometimes the loop appeared to hang: the transcript wouldn't update until the user pressed a key, at which point it would advance by one line. Input wouldn't echo during the stall.

Decision

System prompt

A caliban-cli/src/system_prompt.rs module builds a default prompt auto-derived from current state (caliban identity, cwd, registered tool names + descriptions, basic operating conventions). Resolves precedence:

  • --system "<text>" — literal override
  • --system-file <PATH> — file content
  • --no-system — no system prompt
  • (none) — default

All four are mutually exclusive via clap. The first three produce Option<String>; the default returns Some(text).

Persistence rule: the system prompt is inserted as messages[0] (Role::System) when a session is FIRST created. Loading an existing session does NOT replace the prompt — the persisted system prompt is the contract for that session. Switching models mid-session can produce a mismatch (e.g., Claude-flavored prompt sent to a GPT model); this is documented and considered acceptable. Users can edit the session JSON or start a new session to refresh.

For ephemeral runs (no --session), the system prompt is prepended to the message list at turn-construction time.

TUI streaming stall fix

Three belt-and-suspenders changes in the TUI event loop:

  1. Tick interval at 50ms (20 Hz) added to the tokio::select!. Even with no terminal or agent events, the loop iterates and redraws. This masks any missed-wakeup symptoms from either stream source.

  2. Explicit stdout().flush() after each terminal.draw(). Ratatui's backend should flush internally; this catches any platform-specific line-buffering edge cases.

  3. tokio::task::yield_now() between iterations. Ensures runtime fairness so neither the EventStream task nor the HTTP-streaming task can starve the loop.

If the underlying cause is something deeper (e.g., a missing waker in async_stream::try_stream!), these fixes mask the symptom rather than addressing the root cause. The debug log (below) will help identify whether stalls recur.

Debug logging

--debug flag or CALIBAN_DEBUG=1 env var enables a tracing-subscriber file appender writing to <cache_dir>/caliban/debug.log. Logs each terminal event, agent stream event, draw, and error. No overhead when disabled (the subscriber is not installed).

Consequences

  • Positive: Models now know their context. Stalls (if not eliminated) are masked by tick-based redraws, and diagnostic data is available for future investigation. System prompt is configurable per-invocation and inspectable via /system overlay.
  • Negative: 20 Hz tick = continuous redraws even when nothing changes. Ratatui's diffing keeps wire cost at zero, but CPU spends ~50ms-of-work-per-second on the diff. Acceptable for interactive UX. System prompt grows with tool count; will need summarization at MCP/ skills scale (future).
  • Revisit if: Stalls recur with the tick in place — that indicates a deeper bug in the event-stream or agent-stream that we need to dig into using the debug logs. Or if profiling shows the 50ms tick is expensive (drop to 100ms or 200ms).

ADR 0015 · Context preservation + path conventions (~/dev fix)

  • Status: accepted
  • Date: 2026-05-23

Context

Real-use testing surfaced four issues bundled into one fix:

  1. The TUI's ephemeral REPL (no --session) silently dropped every turn's final_messages, so each new prompt only saw the system prompt + the latest user message. Models had no memory of prior turns in the same REPL session.
  2. WorkspaceRoot::resolve didn't expand ~. When models invoked Bash with cwd: "~/dev" or Read({"path":"~/notes.md"}) the path resolution failed with "No such file or directory." The model misinterpreted the error as "directory doesn't exist."
  3. The TUI's tool-call input summary truncated the partial-JSON stream at 80 chars, sometimes hiding closing braces and making patterns look different than they were.
  4. The default system prompt didn't tell the model that ~ is supported in tool paths.

Decision

  1. Add messages: Vec<Message> to the TUI's App. Initialize from session if any, else empty. Update from RunEnd's final_messages each turn. /clear wipes both the in-memory history and the session's persisted messages.
  2. WorkspaceRoot::resolve expands a leading ~ or ~/ to dirs::home_dir(). Affects all path arguments to all tools. The Bash command string is unchanged — the shell handles ~ expansion there.
  3. At ToolCallEnd, parse the accumulated input as JSON and render key="value", key=value pairs. Fall back to raw truncation on parse failure.
  4. Add a path-conventions bullet to the default system prompt.

Consequences

  • Positive: Ephemeral REPL now feels like a real conversation rather than a series of disconnected one-shots. ~/foo paths work transparently. Tool-call summaries are readable. The system prompt's conventions are accurate.
  • Negative: App::messages and session.messages are now two copies in --session mode (kept in sync at RunEnd). /clear is destructive to session-stored messages — documented.
  • Revisit if: The double-keeping causes correctness bugs (e.g., divergence after a mid-flight panic). The cleanest long-term refactor would be to make App hold an Arc<RwLock<Session>> and treat session as the single source of truth, with the ephemeral case using a synthetic in-memory session.

ADR 0016 · Parallel tool dispatch (supersedes ADR 0009 §"sequential tools")

  • Status: accepted
  • Date: 2026-05-23
  • Supersedes: ADR 0009 (in part — sequential tool dispatch only)

Context

ADR 0009 chose sequential tool dispatch within a single assistant turn as a v1 simplification: "Parallelism is a follow-on (Hooks-pluggable dispatch strategy)." Real workloads bore out the cost. Models routinely emit 2–6 tool_use blocks per turn (parallel Greps + Reads while exploring a codebase, repeated WebFetches to compare sources), and the serial loop paid the sum of their wall-clock latencies rather than the max. The follow-on landed on jf/feat/parallel-tools in commits b6241104751746b5fba58.

This ADR records the resulting architectural commitment.

Decision

  • Parallel tool dispatch is default-on. AgentBuilder initializes parallel_tools: true. Operator opt-out via --no-parallel-tools / CALIBAN_NO_PARALLEL_TOOLS=1 falls through the same code path with permits = 1, preserving serial semantics without a separate branch.
  • Bounded concurrency via an Arc<tokio::sync::Semaphore>. The default cap is available_parallelism().get().saturating_sub(1).max(1) — leave one core for the agent loop, streaming, and the TUI render thread. Tools are mostly I/O-bound, so this is a soft ceiling against runaway fan-out rather than a hard CPU bound. Operator override: --parallel-tool-limit N / CALIBAN_PARALLEL_TOOL_LIMIT=N.
  • before_tool hooks run serially. The hook is the synchronization point for permissions, auditing, and Deny short-circuiting. The serial gate produces a Vec<DispatchPlan> of Allowed / Denied entries; only Allowed entries fan out to a FuturesUnordered. Denied results are yielded first, in assistant-message order, so the TUI sees deny notices before any in-flight tool resolves.
  • Tool::invoke() runs concurrently for Allowed plans. Results arrive in completion order on the event stream (best TUI liveness) and are then reordered back into assistant-message order when appended to the persisted tool_result_blocks so history and replay remain deterministic.
  • Cancellation propagates through the shared tokio_util::sync::CancellationToken. A cancel at any point aborts all in-flight tools; partial results are dropped.
  • Per-tool is_parallel_safe() flag is deferred. All current built-ins are independent: Bash spawns fresh subprocesses; Read / Grep / Glob are pure-read; Edit / Write touch files but the model rarely emits overlapping writes on the same path. YAGNI — add the flag if write contention is observed in practice (e.g. two Edit calls on the same file in one turn).

Rationale

The semaphore-bounded FuturesUnordered pattern keeps the agent loop single-threaded while extracting most of the available parallelism from the model's batching. The serial before_tool gate keeps the existing hook contract intact — permission systems don't have to reason about race conditions across concurrent tool calls. Streaming ToolCallEnd events in completion order means the TUI shows whichever tool finishes first immediately, instead of waiting for the slowest one in batch order.

Consequences

  • Positive. Multi-tool turns clear in roughly max(t_i) rather than sum(t_i). parallel_tools=false still works as an opt-out for users who want strict deterministic ordering in the event stream (e.g. for snapshot testing).
  • Negative. Tracing output interleaves across tools within a turn; log readers need to follow tool_use_id to reconstruct per-tool sequences. The new caliban::tools tracing event surfaces dispatched/denied counts and total wall time per turn so the perf-baseline numbers stay legible.
  • ADR 0009's "sequential tools" guidance is superseded. The rest of ADR 0009 — stream-as-primitive, opt-in compaction, conservative retry classifier — remains in force.
  • Sub-agent primitive (forward link to 0021-sub-agent-primitive.md when written) inherits this dispatch model: each sub-agent runs its own bounded parallel loop, and the parent agent's semaphore is independent of the child's.
  • Revisit if: write contention surfaces in real use (add is_parallel_safe() and a per-tool exclusion policy), or if profiling shows the semaphore itself is a contention point at high concurrency (unlikely; tokio's Semaphore is fair and cheap).

References

  • Design spec: docs/superpowers/specs/2026-05-23-parallel-tools-design.md
  • Commits: b624110 (design), 6b71a6c (plan), 4751746 (builder fields), b5fba58 (FuturesUnordered + Semaphore refactor)
  • Implementation: crates/caliban-agent-core/src/agent.rs (parallel_tools / parallel_tool_limit fields), crates/caliban-agent-core/src/stream/parallel.rs (three-phase dispatch)

Revised 2026-05-26

The original Decision deferred a per-tool is_parallel_safe() flag, noting that no built-in had write contention. That observation was true in 2024 (Bash / Read / Grep / Glob). It is no longer true: ADRs 0028 + 0035 introduced Edit / Write / MultiEdit / NotebookEdit / WriteMemoryTopic, all of which can collide on the same target within one turn.

Revised mechanism: parallel_conflict_key(&self, input) -> Option<String> on the Tool trait. Returns None for fully parallel-safe tools (the default; matches the original 2024 posture). Returns a conflict-identity string for tools whose effect is keyed to a target — typically the canonicalized path for filesystem writes; for WriteMemoryTopic, a memory:{type}:{name} string. The dispatcher builds a per-key tokio::sync::Mutex map and each tool's dispatch future awaits its key's mutex (FIFO) before acquiring the parallel_tool_limit semaphore. Same-key calls serialize in submission order; different-key calls and None-key calls parallelize.

What this preserves. Read / Grep / Glob / Bash continue to behave exactly as before (default None). Two Edits on different files still parallelize. The parallel-tools differentiator from Claude Code is intact.

What this fixes. Two Edits on the same file (whether via the same path string, a ./-prefixed variant, or a symlink that canonicalizes to the same inode) now serialize in submission order rather than interleaving non-deterministically.

Per-tool overrides shipped: Edit, Write, MultiEdit, NotebookEdit all key on the canonicalized path (crates/caliban-tools-builtin/src/parallel.rs::canonical_key). WriteMemoryTopic keys on memory:{type}:{name}.

Tests: crates/caliban-agent-core/tests/parallel_conflict_key.rs covers distinct-key parallelism, same-key serialization, keyed + plain mixing, and shared-key + independent triples.

ADR 0017 · MCP client architecture

  • Status: accepted
  • Date: 2026-05-23

Context

caliban's /mcp overlay is a stub (see ADR 0013). Adding real MCP (Model Context Protocol) client support is priority #1 on the post-WebFetch roadmap: it unlocks the long tail of integrations (Linear, Notion, Slack, in-house servers) without needing a built-in tool per service. The full implementation spec lives at docs/superpowers/specs/2026-05-23-mcp-client-design.md; this ADR records the architectural commitments only.

Decision

Transport: stdio in v1; SSE + StreamableHTTP deferred

v1 ships stdio transport only. Each configured server is launched as a child process; JSON-RPC frames travel over its stdin/stdout. SSE and StreamableHTTP transports are non-trivial separate deps (reqwest-eventsource, hyper streaming) and gate on real-world demand. They land in v2.

SDK: rmcp (official Rust SDK)

We adopt the rmcp crate (the official Rust MCP SDK published by the Model Context Protocol org) over the community mcp-client crate. Rationale: official maintenance, broader trait coverage (Client + Server + transports), and a working transport::child_process module we'd otherwise reimplement. Pinned to rmcp = "0.x" (latest released line at adoption time); workspace-pinned to keep upgrades atomic across our crates.

Auth: env-var only in v1; OAuth deferred

v1 supports passing secrets to MCP servers via the env table in the server-config TOML (with optional ${VAR} expansion from the operator's environment). Per-server OAuth — the protocol's full authentication story for hosted MCP servers — is deferred to v2 and will land alongside SSE/HTTP transports, where it's actually relevant. Stdio servers overwhelmingly authenticate via env vars today.

Tools surface as Box<dyn Tool> in the existing registry

MCP-discovered tools wrap in an McpTool struct that implements the caliban_agent_core::Tool trait and registers in the same ToolRegistry as built-ins. Naming convention: mcp__<server>__<tool> (double underscores) — mirrors Claude Code so operators recognize the surface. <server> is the config-file table name; <tool> is the server-advertised tool name; both are ASCII-snake-case-normalized at registration so names match what the provider's tool-use API accepts.

Tool::input_schema() returns the schema the server advertised, with no rewriting. Tool::invoke() proxies via rmcp and translates the response into caliban ContentBlocks. Hooks (before_tool / after_tool) fire for MCP tools exactly as they do for built-ins — no special case — which means existing permission UX, audit logging, and deny-rules cover MCP automatically.

Server config file

Two TOML files, merged at startup with project overriding user:

  • ~/.config/caliban/mcp.toml (per-user; XDG-aware on Linux, cache_dir on macOS)
  • .caliban/mcp.toml (per-project, relative to cwd; optional)

Schema is fully specified in the design doc. Project-level config can disable a user-level server by setting disabled = true for the same name. Config-file location and merge semantics will be revisited when the broader .caliban/ config story lands (separate spec); the MCP spec is the prior art that pattern will follow.

Discovery: best-effort at startup

At caliban startup, for each non-disabled server entry: spawn the child process, send initialize, list tools, and register an McpTool per advertised tool. A failure (spawn fails, handshake times out, server reports an error) logs a warning and continues — it does not abort startup. The TUI's /mcp overlay surfaces per-server status (connected / failed / disabled) so the operator can see what's missing without watching stderr.

Tools are not re-discovered after startup in v1; if a server adds a tool mid-session, the user restarts caliban. Server-push notifications (notifications/tools/list_changed) are deferred.

Lifecycle: session-scoped; cleanup on exit

Servers run for the duration of the caliban session. On shutdown (clean exit, Ctrl-C, panic) the McpClientManager's Drop sends notifications/cancelled to each server and drops its tokio::process::Child (configured with kill_on_drop(true)), so no servers leak even on unclean exit.

Consequences

  • Positive: Unblocks integration with the dozens of stdio MCP servers already published. Tool surface is uniform — same trait, same registry, same hooks — so the agent and TUI need no MCP- specific code paths after registration. Stdio-first keeps the initial dep surface small.
  • Negative: SSE/HTTP servers aren't reachable in v1; operators who want a hosted MCP server have to wait for v2 or wrap it in a local stdio proxy. The mcp__<server>__<tool> name shape is long and noisy in transcripts; acceptable for parity with Claude Code. Each server is one extra child process — RAM and FD overhead is per-server, not amortized.
  • Revisit if: Real demand emerges for hosted (SSE/HTTP) servers — promote the v2 work earlier. If rmcp's release cadence lags protocol changes, evaluate mcp-client. If tool-name collisions become common (two servers exposing a tool with the same short name), the mcp__<server>__ prefix already handles it, but UX may want a friendly alias mechanism.

ADR 0018 · Memory tier model (CLAUDE.md ingestion + auto-memory)

  • Status: accepted
  • Date: 2026-05-23

Context

caliban has no persistent memory across sessions. The default system prompt is rebuilt from cwd + tool list each invocation (ADR 0014), so operator preferences, project conventions, and learned facts about the user have to be re-supplied by hand every time. Claude Code solves this with a CLAUDE.md mechanism plus an auto-memory tier the agent can write to via its existing file tools. The user's own ~/.claude/CLAUDE.md already exercises this pattern; that mental model is the target.

Decision

caliban adopts a three-tier memory model, all of which live on disk as plain Markdown and are read at session start. A fourth MCP-mediated tier slots in later (forward link only; not in this ADR).

Tier 1 — Global

  • Path: ~/.config/caliban/CLAUDE.md (XDG $XDG_CONFIG_HOME honored).
  • Owner: the operator. caliban never writes here.
  • Contents: cross-project preferences (tool choice, style, persona).
  • Read once at startup, optional (missing file is fine).

Tier 2 — Project

  • Path: <workspace_root>/CLAUDE.md where workspace_root is WorkspaceRoot::root() (ADR 0010).
  • Owner: the project / repo. caliban never writes here (operators commit it like any other file).
  • Contents: repo-specific conventions, build commands, taboos.
  • Read once at startup, optional.

Tier 3 — Auto-memory

  • Directory: ~/.local/share/caliban/projects/<sanitized-cwd>/memory/ (XDG $XDG_DATA_HOME honored). Sanitization replaces / with - and drops the leading dash, so /Users/jf/dev/caliban becomes Users-jf-dev-caliban.
  • Files: one MEMORY.md (index, ≤ 200 lines) plus arbitrary <slug>.md topic pages.
  • Owner: the agent. Writes go through the existing Write/Edit tools — no special memory tool, no separate trust path.
  • Only MEMORY.md is loaded eagerly. Topic pages are lazily fetched by the agent via Read when the index points it at one.

Composition

All three tiers are concatenated into the system prompt above the auto-generated default (cwd + tool list + conventions, per ADR 0014). Order: global → project → auto-memory index. Each tier is wrapped in explicit delimiters so the model can tell them apart:

<global-claude-md path="…/CLAUDE.md">…</global-claude-md>
<project-claude-md path="…/CLAUDE.md">…</project-claude-md>
<auto-memory-index path="…/MEMORY.md">…</auto-memory-index>

<default system prompt body from system_prompt::build_default …>

Missing tiers are simply omitted (no empty tag block).

Token budget

The combined memory prefix is capped at 8 000 tokens (estimated as chars / 4 — provider-agnostic and cheap). If the combined size exceeds the cap, auto-memory is truncated first (with a [truncated: N bytes] notice appended to its block), then project, then global. Hitting the global cap is treated as operator error (loud tracing::warn! plus the truncation marker in the prompt).

Retrieval

None in v1. Memory IS the system prompt prefix. Semantic search over memory (RAG) is a v2 concern and would slot in as a new tool (MemorySearch), not as a change to how memory is loaded.

  • MCP memory tier. Once MCP support ships, an MCP server like the user's SilverBullet integration plugs in as Tier 4: not eagerly loaded, accessed on demand via MCP tool calls. The precondition-check pattern from the user's own CLAUDE.md ("skip if MCP is absent") applies.
  • /memory slash command. Shows active tiers + paths + sizes; offers $EDITOR open for the global and project files. Detailed in the spec.

Consequences

  • Positive. Matches the user's existing mental model exactly, zero learning curve. Agent maintains its own knowledge using the same Read/Write/Edit it already has — no special memory tool to audit, sandbox, or rate-limit. MCP tier slots in cleanly without reshaping the loader.
  • Negative. 8K tokens is real cost on every turn (Anthropic prompt caching recoups most of it). Agent can clutter auto-memory if write conventions aren't well-specified (the spec pins them down). No drift detection between project CLAUDE.md and what the agent "remembers" — by design; the project file wins by splice order, but contradicting auto-memory will sit side by side.
  • Revisit if: the 8K cap starts triggering routinely (raise it, or add summarization); auto-memory becomes a write-only graveyard (add a v2 MemorySearch tool and stop loading the full index); per-project agent memory grows past what's reasonable to grep (move to SQLite, but keep markdown export).

Crate

New crate caliban-memory owns tier discovery, sanitization, file IO, splicing, and budget enforcement. caliban-agent-core does not take a dep on it — the binary (caliban/src/main.rs) calls the memory crate at startup and passes the assembled string to system_prompt::resolve as a prefix.

Revised 2026-05-26

Bumped the combined-prefix default from 8,000 to 32,000 tokens. The 8,000-token default was conservative against 2024 context windows and was increasingly punishing in 2026 (1M-token Sonnet, 200K standard on most providers). Truncation-first behavior was at risk of dropping the auto-memory index — exactly the tier that grows.

Added per-scope token caps via three optional [memory] settings keys (all integer, default unset):

  • cap_tokens_auto — caps the auto-memory tier independently.
  • cap_tokens_claude_md — caps the combined CLAUDE.md tier (global + project). When binding, truncates project first, then global.
  • cap_tokens_combined — overrides the combined ceiling (max_tokens).

When the sum of both per-scope caps would exceed cap_tokens_combined, each is scaled down proportionally rather than silently dropping a tier. Settings.json values override the corresponding env vars (CALIBAN_MEMORY_BUDGET_TOKENS, CALIBAN_MEMORY_CAP_TOKENS_AUTO, CALIBAN_MEMORY_CAP_TOKENS_CLAUDE_MD) when both are present.

Truncation order within a tier is unchanged from the original Decision.

ADR 0019 · Skills loading

  • Status: accepted
  • Date: 2026-05-23

Context

The /skills overlay in the TUI is currently a stub (see ADR 0013). Skills are priority #4 on the post-WebFetch roadmap: they let the operator drop in reusable instruction-and-procedure packages — Claude Code's "superpowers" model — without recompiling caliban or shipping prompts in-crate. The full implementation spec lives at docs/superpowers/specs/2026-05-23-skills-design.md; this ADR records the architectural commitments only.

Decision

Skills are file-based, frontmatter-keyed

A skill is a directory <skill-name>/SKILL.md. The file is YAML frontmatter followed by a markdown body:

---
name: brainstorming
description: "You MUST use this before any creative work ..."
metadata:
  trigger: pre-implementation
---

# Brainstorming Ideas Into Designs
...

name and description are required; metadata.* is a free-form map the loader passes through unchanged. The body is the model-facing instruction set — no execution, no scripts auto-run, no sandbox. This format matches the superpowers plugin so existing skills can be copied in unchanged.

Skills surface as a single built-in Skill tool

A built-in Skill tool with invoke({"name": "<x>"}) loads <x>/SKILL.md and returns its body as a ContentBlock::Text. Loaded skills are NOT registered individually — that would explode the tool-use schema. The Skill tool's description carries a bulleted <name>: <description> list of every loaded skill, so the model knows the menu and can call Skill with the right name.

Skills are NOT auto-loaded into the system prompt

Loading every body upfront burns thousands of tokens per turn at any nontrivial skill count. Only description lines hit the prompt (via the tool description above); bodies load on-demand.

Discovery locations (priority order)

  1. <workspace_root>/.caliban/skills/ — project-pinned skills
  2. ~/.config/caliban/skills/ — per-user skills
  3. ~/.local/share/caliban/plugins/*/skills/ — global plugin dir, mirrors how Claude Code resolves plugin skills

A skill in an earlier location shadows a later one with the same name. Paths are XDG-aware on Linux and use cache_dir/data_dir analogues on macOS, matching the MCP config conventions in ADR 0017.

No skill execution sandbox

Skills are text injected into the model's context. They are not executable code. The scripts/ and references/ subdirectories that appear in some Claude Code skills are loadable only by the model through existing Read / Bash tools — caliban itself does not execute anything skill-side. This keeps the trust model identical to "the operator wrote this file."

New crate: caliban-skills

Skills logic lives in a new workspace crate crates/caliban-skills/ exporting SkillLoader, Skill, and SkillTool. It depends on caliban-agent-core (for the Tool trait), serde + serde_yaml for the frontmatter, ignore (already in the workspace) for directory walking, and thiserror. The caliban binary constructs one SkillTool at startup, registers it with ToolRegistry, and wires the loaded skills into the /skills overlay.

Consequences

  • Positive: Existing superpowers-format skills port with zero changes. Token cost stays bounded — only descriptions hit the prompt; bodies are pay-per-use. Skills are uniform with every other tool (same registry, same hooks, same audit log).
  • Negative: The Skill tool's description grows with skill count; ~50 skills crowds the schema budget (truncation policy is spec-level concern). Frontmatter parse failures are per-file warnings — silent skill loss if the operator doesn't watch logs. No versioning: an update is a directory-replace.
  • Revisit if: Description-list growth crowds the schema — consider a two-tier surface (frequent inline, rare via a ListSkills tool). If operators want bundled defaults, add an opt-in --with-default-skills flag (currently a non-goal).

ADR 0020 · Permission rules layered on top of Hooks

  • Status: accepted
  • Date: 2026-05-23

Context

caliban currently has no permission model. The Hooks::before_tool extension point can already short-circuit a tool call with a HookDecision::Deny(msg), but nothing in the tree consults rules, prompts the operator, or enforces a default policy. As we add more "dangerous" tools (BashTool already executes arbitrary shell; WriteTool, EditTool, WebFetch, future MCP tools), we need a rule-based gate that matches the operator-facing UX of Claude Code without inheriting its classifier complexity.

Decision

Implementation site

Permissions are a layer on top of the existing Hooks trait — not a parallel system. We add a PermissionsHook that implements Hooks::before_tool and consults a rule database. Composition with other hooks (observability, debug logging) is handled by a small CompositeHooks adapter; permissions just plug in as one entry.

Rule schema

Each rule has three fields:

  • tool — pattern string (glob-style; see Pattern matching).
  • actionAllow | Deny | Ask.
  • comment — optional free-text shown in the TUI prompt.

Rule sources (priority high → low)

  1. CLI flags --allow <PAT>, --deny <PAT>, --ask <PAT> (one-shot, repeatable).
  2. Project file <workspace>/.caliban/permissions.toml.
  3. User file ~/.config/caliban/permissions.toml.
  4. Built-in defaults (read-only tools Allow; everything else Ask).

Higher-priority rules shadow lower-priority ones. Within a single source, first match wins, so users place narrow rules above the catch-all.

Pattern matching

Glob-style on tool_name plus an optional :<first-arg-prefix> suffix.

  • Bash — bare tool name; matches any input.
  • Bash:git * — bash whose command field starts with git .
  • Bash:* — equivalent to Bash (explicit wildcard).
  • * — matches every tool.

The "first arg" is tool-defined: for Bash it's the command field; for WebFetch it's url; for Read/Edit/Write it's path. Tools that don't declare a first-arg field are matched on tool name only. Prefix-after-colon uses simple glob (*, ?) on the stringified first arg, not full regex — keeps the rule format inspectable.

Ask action

Ask requires an interactive UI. The TUI provides a modal prompt (allow once, allow permanently, deny once, deny permanently). In non-interactive sessions (no TTY, no --auto-allow), Ask degrades to Deny with a clear log message. --auto-allow is the documented "escape hatch" for non-interactive runs and is loud about being dangerous.

Consequences

  • Positive: mirrors the Claude Code rule format operators already know, without copying the classifier-heavy approach (bashClassifier / yoloClassifier). Reuses the existing Hooks contract — zero new core traits. Project + user files allow shared team policies committed to source control.
  • Negative: glob matching on first-arg-prefix can be surprising (e.g. Bash:rm * does not match Bash:sudo rm *). Acceptable; the TUI prompt shows the rule that matched so users can see why a call was allowed/denied. Shadowed-rule warnings are deferred.
  • Revisit if: prefix matching proves insufficient for real-world bash commands and operators are routinely surprised by Allow/Deny outcomes. Next step would be a classifier (LLM-graded command-intent), but we want concrete evidence before going there.

ADR 0021 · Sub-agent primitive via AgentTool

  • Status: accepted
  • Date: 2026-05-23

Context

caliban's turn loop is a single agent calling tools. Several real-use patterns benefit from a sub-agent primitive: parallel search over a large codebase without polluting the parent's context, subtasks with a restricted tool palette, or delegating multi-step investigations whose intermediate steps shouldn't bloat the parent transcript.

Claude Code has two related primitives — synchronous Agent (a tool) and Task (async background runs you poll). We need the synchronous one. Async Task is a separate, larger piece of work.

Decision

Surface: a tool, not a new core type

Sub-agents are spawned by the model invoking a built-in tool AgentTool. Input: {prompt, tool_allowlist?, model?}. Output: one ContentBlock::Text containing the sub-agent's final assistant text (truncated to ~5000 chars).

In-process, not child-process

The sub-agent runs an entire turn loop on its own Agent instance in the same tokio runtime. Single binary, single runtime — cancellation and tracing stay unified. Sub-agent shares the parent's Provider instance, inheriting HTTP/2 multiplexing, the connection pool, and Anthropic-side prompt cache locality. No IPC, no serialization. The cost is no OS-level isolation, which is acceptable: the existing trust model (operator already runs BashTool-capable code) doesn't gain much from a child process.

Construction via factory

AgentTool::new(factory: Arc<dyn Fn(&AgentToolInput) -> Agent + Send + Sync>). The factory is wired from main and closes over the parent's provider, tool registry, and hooks. Each invocation builds a fresh Agent with the parent's provider; model from input (or parent's); a ToolRegistry filtered by tool_allowlist; and max_turns = 20 (operator-tunable in code, not from model input).

Tool allowlist semantics

  • tool_allowlist: ["Read", "Grep"] → sub-agent gets exactly those. Unknown names are silently dropped.
  • tool_allowlist: null or omitted → sub-agent inherits every parent tool EXCEPT AgentTool itself.

No recursion in v1: AgentTool is filtered out of every sub-agent's registry. Nested sub-agents are a v2 problem (depth limits, fan-out, cost ceilings).

Budgets

max_turns = 20 (hard). Sub-agent inherits the parent's max_tokens. No per-call cost ceiling because we don't have a router yet; add max_cost_usd later.

Transcript representation

Parent transcript gets the ToolUseBlock (name = "AgentTool", input JSON) and a ToolResultBlock containing the sub-agent's final assistant text (truncated to ~5000 chars). Intermediate sub-turns are not persisted in the parent session — they live only in the sub-agent's transient buffer. Debug logs capture the full trace.

Not a Task primitive

Claude Code's Task is async-with-lifecycle (spawn, poll, cancel, retrieve). AgentTool::call is synchronous: the parent's turn loop blocks on the sub-agent's loop completing. Async Task is v2.

Consequences

  • Positive: unlocks the "parallel exploration without context bloat" pattern; reuses every existing primitive (Agent, Hooks, ToolRegistry, CancellationToken). Permissions apply to the sub-agent's tools just like the parent's, because the sub-agent's Agent is built with the same hooks chain.
  • Negative: synchronous-only — if a sub-agent loop takes minutes, the parent appears stuck. Mitigation: sub-agent stream events bubble to the TUI via the parent's stream so the operator still sees progress. Token accounting at the parent level shows sub-agent usage as a single line (the ToolUseBlock); cost attribution to specific sub-turns lives only in the debug log.
  • Revisit if: users routinely want to dispatch many sub-agents in parallel — at that point we promote AgentTool from synchronous to the v2 Task primitive and add lifecycle management.

ADR 0022 · Model routing architecture

  • Status: accepted
  • Date: 2026-05-23

Context

The agent makes provider calls for several distinct purposes — the main conversational loop, summarization for compaction, embeddings for memory, fast classification for routing decisions, sub-agent loops, etc. Today those all run through the single Arc<dyn Provider> handed to the Agent. Operators who want to use Sonnet for the main loop, Haiku for summarization, and a local Ollama model for fast classification have no clean way to express that.

Claude Code solves this with hardcoded getMainLoopModel / getSmallFastModel helpers. That's fine for a single-vendor harness; it's wrong for caliban, which is provider-agnostic by design. Operators should be able to compose any model from any provider for any purpose without recompiling.

A model router also turns out to be the natural home for several already-deferred concerns: per-route fallback chains, hedged requests, circuit breakers, cost/usage aggregation, and unification of the divergent prompt-cache surfaces across Anthropic, OpenAI, and Gemini.

This is signature differentiation for caliban; it deserves its own layer.

Decision

  • Add a new Layer-3 crate caliban-model-router. It sits between caliban-agent-core and the four caliban-provider-* adapter crates. No agent-core code changes shape; the agent continues to take a single Arc<dyn Provider>.
  • The router IS a Provider. It implements the same trait the adapters implement, so the agent sees one provider — the router — and the router internally dispatches each complete / stream call to the right downstream Provider + model based on the request's purpose, the operator's policy, and the capabilities the request needs.
  • Routes are matched by RequestMetadata.purpose. A new field on the existing RequestMetadata struct: purpose: Option<RequestPurpose> with variants MainLoop | Summarization | Embedding | FastClassifier | SubAgent | Custom(String). Callers that don't set a purpose route through a default configured by the operator (likely MainLoop).
  • Routing policy is operator-defined. A TOML config file plus a builder API. No auto-learning, no automatic cost optimization, no hidden behavior. The operator owns the cost / latency / capability trade-offs explicitly. This is a deliberate differentiator from Claude Code's hardcoded paths.
  • Capability filtering is mandatory. Each route declares its provider + model; the router consults Provider::capabilities(model) before dispatch and skips a route whose capabilities don't satisfy the request (e.g. request needs ToolUseCapability::ParallelCalls but the route's model only supports Basic).
  • Per-route fallback is opt-in and ordered. When the same purpose appears in multiple [[route]] entries, the entries form a fallback chain in declaration order. The router tries them in sequence on a retryable failure of the previous entry (rate-limit, model unavailable, transient network error). Implementation is deferred to v2 — this ADR commits to the design.
  • Cost / usage aggregation is a router responsibility. The router sees every call and every Usage. It maintains a per-(provider, model) accumulator and exposes a RouterStats snapshot for the TUI's existing /usage overlay (ADR 0013) to render.
  • Hedging and circuit-breakers are router responsibilities. Both are sketched in the design spec but deferred to v2.

Consequences

  • Agent constructor unchanged. AgentBuilder::provider(...) takes the router as its Arc<dyn Provider> exactly like any adapter. No code in caliban-agent-core knows the router exists.
  • Adapters stay simple. Per-adapter retry policy (existing RetryPolicy for transient errors) remains in the adapter. The router handles route-level fallback. The two layers compose: adapter retries within a route; router moves to the next route only if the adapter exhausts its retries with a fatal-for-this-route error.
  • Prompt-cache unification lands here. Anthropic's cache_control markers, OpenAI's cache_read_input_tokens, and Gemini's context-caching all surface as the same Usage.cache_read_input_tokens / cache_creation_input_tokens values once they reach the router; the router is the natural place to normalize the bookkeeping.
  • before_turn hook needs a way to see the resolved route. The agent's TurnCtx currently exposes config.model, which is the caller's request, not the route's actual choice. A new optional field (or a router-supplied hook surface) is required so the TUI status line can display "Sonnet via Anthropic, fallback gpt-4o" instead of just the requested logical name. Detailed in the spec.
  • Sessions become route-history-aware. If a session was started on route A and resumes on route B (because the config changed, or the primary route is unavailable), prompt-cache markers from the prior provider are inert. The router documents this and falls back to no-cache for the transition turn.
  • Forward links: hedged requests, circuit breakers, and adaptive retry budgets were listed as non-goals in 2026-05-23-perf-baseline-design.md. This ADR pulls them under the router's umbrella for v2.
  • Revisit if: the operator-defined policy turns out to be a meaningful UX burden in practice (consider a "balanced" default policy), or if hedged requests prove valuable enough to promote from v2 to v1.

References

  • Design spec: docs/superpowers/specs/2026-05-23-model-router-design.md
  • Provider trait: crates/caliban-provider/src/lib.rs
  • Capabilities: crates/caliban-provider/src/capabilities.rs
  • Per-adapter retry: ADR 0009 (RetryPolicy)
  • Usage overlay: ADR 0013 (TUI overlays)
  • Perf-baseline non-goals: docs/superpowers/specs/2026-05-23-perf-baseline-design.md

ADR 0023 · MCP v2 — transports, OAuth, elicitation, resources

  • Status: accepted
  • Date: 2026-05-24
  • Spec: docs/superpowers/specs/2026-05-24-mcp-v2-design.md
  • Supersedes scope of: ADR 0017 deferred items

Context

ADR 0017 shipped caliban's MCP client as a config-only scaffold: McpClientManager::start is a no-op, McpTool::invoke is unwritten, and the only working pieces are TOML parsing and server-name validation. Closing the gap to Claude Code requires (a) actually wiring rmcp so stdio servers spawn and discover tools, and (b) adding HTTP/SSE transports + OAuth + elicitation + resources.

Decision

Phased delivery — three sub-PRs

v2 ships in three independently-mergeable phases:

  • Phase A — stdio wiring. Implement Conn::start for stdio and McpTool::invoke. In-tree test server. Closes the deferred "rmcp wiring" follow-up from ADR 0017.
  • Phase B — HTTP + SSE transports. Adds Transport::Http and Transport::Sse over the corresponding rmcp transport modules. oauth = "off" only at this phase — for self-hosted endpoints behind a fixed bearer or no auth.
  • Phase C — OAuth + elicitation + resources. McpOAuthFlow (PKCE
    • loopback callback + keyring token storage), ElicitationBridge (TUI modal + non-interactive auto-decline), McpResource (@server:resource autocomplete and inline read).

Each phase ticks rows in docs/parity-gap-matrix.md from 🔴 → ✅ in the PR that lands it.

Transport selection is a config field, not separate crates

ServerConfig.transport: "stdio" | "http" | "sse" (default "stdio") selects which rmcp transport constructor to call. The manager is otherwise transport-agnostic — Conn exposes the same rmcp::client::RunningService<…> regardless of transport. This keeps the agent-side code path uniform: Hooks, dispatch, cancellation, and serialization see no MCP-transport details.

OAuth uses PKCE + a loopback callback on a random port

Hosted MCP servers behind OAuth use the authorization-code flow with PKCE (S256). caliban spawns a short-lived axum server on 127.0.0.1:0, prints the auth URL, captures the callback, and exchanges the code for tokens. Tokens persist in the OS keyring (keyring crate); fallback to $XDG_DATA_HOME/caliban/mcp-tokens.json mode 0600 on systems without keychain support. --mcp-oauth-port and CALIBAN_MCP_OAUTH_PORT override the random port for firewalled machines.

We pick PKCE + loopback over device-code or out-of-band paste because it's what Claude Code uses and what RFC 8252 recommends for native clients. A v2.1 follow-up may add a paste-back fallback if real demand emerges from operators on hardened networks.

Elicitation is a side-channel, not a tool

ElicitationBridge is a separate caliban-side type with its own mpsc queue; it does not extend the Tool trait. The TUI subscribes; non-interactive callers (--print, CI) get a default auto-Decline handler. Elicitation requests are gated by the existing permission rule grammar via a new pattern: Elicit(<server>).

Resources are pulled lazily

Resources are not eagerly listed at startup. The first time the user types @<server>:, caliban calls resources/list for that server and caches the result; resources/list_changed notifications invalidate the cache. Resource templates like github://repos/{owner}/{repo}/issues/{id} are expanded positionally from arguments typed after the resource name.

Per-server permission scoping lifted into our rule grammar

Claude Code's allowedMcpServers / deniedMcpServers settings become inline [server.X.permissions] blocks in mcp.toml. They merge with global permissions in a documented order: global deny → server deny → server ask → server allow → global ask → global allow → default(Ask). The /mcp overlay shows the effective rule for a focused tool.

Env-var contract — CALIBAN_* primary, MCP_* fallback

caliban reads CALIBAN_MCP_TIMEOUT, CALIBAN_MCP_TOOL_TIMEOUT, CALIBAN_MAX_MCP_OUTPUT_TOKENS. If those are unset and the Claude-Code-style MCP_TIMEOUT / MCP_TOOL_TIMEOUT are set, we honor them for compat. We do not read MAX_MCP_OUTPUT_TOKENS without the CALIBAN_ prefix because servers may set it themselves.

Consequences

  • Positive: Closes nine 🔴 rows in the parity matrix in one multi-PR initiative. Transport plurality makes hosted-MCP ecosystems reachable; OAuth unblocks every commercial server that uses it. Elicitation is a meaningful UX upgrade (servers can ask before destructive ops without baking confirmation into every tool). Resources turn MCP from "tools only" into "tools + data references" — closes the @server:resource parity gap.
  • Negative: Dependency footprint grows by ~5 crates (rmcp HTTP/SSE features, oauth2, axum, keyring). Loopback OAuth assumes the user can open a browser; hardened workstations may need oauth = "manual". Token storage adds a per-OS contract surface to test. Elicitation introduces a new modal flow the TUI must handle alongside the Ask modal.
  • Revisit if: Hosted MCP ecosystem standardizes on a different auth flow; if rmcp evolves a higher-level OAuth helper, our bespoke flow can shrink. If resource discovery latency becomes a problem (large resources/list responses), promote to eager fetch with a background refresh task.

ADR 0024 · Hook event taxonomy + external handler types

  • Status: accepted
  • Date: 2026-05-24
  • Spec: docs/superpowers/specs/2026-05-24-hooks-expansion-design.md

Context

caliban's Hooks trait today exposes four events (before_turn/after_turn/before_tool/after_tool) and is only addressable from in-process Rust code: there's no way to drop a shell script into ~/.config/caliban/ and have it run on SessionStart, no HTTP callback for audit servers, no MCP-tool-as-policy-gate, no LLM-classifier for UserPromptSubmit. Claude Code's documented hook surface covers ~25 event names and five handler types; closing that gap is Tier-1 foundation work because plugins, observability, and automation all build on it. The full spec is in docs/superpowers/specs/2026-05-24-hooks-expansion-design.md; this ADR records the architectural commitments only.

Decision

Event names mirror Claude Code's PascalCase taxonomy

Add 15+ event methods to the Hooks trait, all with default no-op implementations so existing Hooks impls keep compiling unchanged. First-class events: SessionStart, SessionEnd, UserPromptSubmit, PreCompact, PostCompact, ConfigChange, CwdChanged, FileChanged, SubagentStart, SubagentStop, TaskCreated, TaskCompleted, PermissionRequest, PermissionDenied, Notification, Stop, StopFailure, PostToolUseFailure. Reserved but not-yet-fired in v1: Setup, UserPromptExpansion, PostToolBatch, InstructionsLoaded, WorktreeCreate, WorktreeRemove, Elicitation, ElicitationResult, TeammateIdle.

Five external handler types — command/http/mcp/prompt/agent

A new HookRouter consumes hooks.toml (or the hooks table inside the unified settings.json once ADR 0026 lands) and dispatches events to externally-configured handlers. The router itself implements Hooks, so it composes into AgentBuilder like any other in-process hook stack — behind PermissionsHook in the chain.

  • command: spawn a child; stdin is event JSON; stdout JSON (or exit code) determines the decision.
  • http: POST event JSON; response JSON is the decision.
  • mcp: invoke a configured MCP server's tool with the event JSON.
  • prompt: call the model router (default FastClassifier purpose) with the prompt + event JSON; schema enables structured-output.
  • agent: delegate to a subagent (async-only).

Decision protocol — stdout JSON or exit codes

Shell-command handlers signal their decision via stdout JSON (hookSpecificOutput.permissionDecisionallow|deny|ask, permissionDecisionReason, optional updatedInput) or via exit codes (0 = Allow, 2 = Deny with stderr as reason, anything else = Allow + warning). HTTP and MCP handlers use the same response shape.

We extend HookDecision with UpdatedInput(Value) so hooks can rewrite a tool's input before dispatch. The rewritten input is validated against the tool's input_schema(); validation failure is a hard deny.

Stdin payload uses snake_case + camelCase mix, deliberately

The envelope's hook-protocol fields (hookEventName, hookSpecificOutput) match Claude Code so existing CC hook scripts work with a one-line wrapper. Caliban-specific fields (session_id, tool.useId, turn_index) keep snake_case for parity with our internal JSON. The diff is documented in the README.

URL allowlist for HTTP hooks; env-var allowlist for ${VAR} expansion

HTTP handlers fail closed: the operator must list each allowed URL glob in allowed_http_hook_urls (default empty). Headers and URL ${VAR} expansion is gated by http_hook_allowed_env_vars. This prevents a project-scope hooks.toml from exfiltrating user-scope secrets via an attacker-controlled callback URL.

Async handlers detach onto a bounded task pool; their decisions are ignored

async = true handlers are fire-and-forget: useful for audit, metrics, and code-review subagents that observe but don't gate. A Semaphore-bounded pool (default 16) caps the parallel async-handler count. Agent-type handlers are async-only by definition (synchronous subagent calls from a hook would risk turn-budget blowup and recursion).

Parallel tool dispatch ordering caveat is preserved

Under parallel tool dispatch (ADR 0016), PostToolUse fires in completion order, not assistant-message order. We document this on the trait and surface tool_use_id in ToolCtx so hook authors can correlate. The router serializes hook handlers per-tool-call but lets distinct tool_use_ids run concurrently.

Kill switch and managed-only mode are first-class

disable_all_hooks = true blocks all external handlers but leaves in-process Hooks impls running (PermissionsHook, audit, anything the binary wires up). allow_managed_hooks_only = true further restricts execution to handlers loaded from the managed settings scope (ADR 0026). Both flags are visible in the /hooks overlay.

Consequences

  • Positive: Closes nine 🔴 rows under "B. Hooks & extensibility" in docs/parity-gap-matrix.md in one PR (only "Plugin packages" and "Hook inheritance for subagents" remain — both gated on other initiatives). Establishes the substrate plugins and observability build on. Shell-command hooks let operators glue caliban into existing audit / CI / policy stacks without touching Rust.
  • Negative: Hook handlers run with caliban's privileges; shell hooks are arbitrary code execution by design. Until an OS sandbox lands, a hostile project-scope hooks.toml is a real risk — mitigated by the URL/env allowlists and managed-only mode, but fundamentally a "trust your repos" model. The Hooks trait grows from 4 to ~18 methods; default no-ops keep call-sites compatible but the trait's IDE-completion surface bloats.
  • Revisit if: Plugin system (ADR 0030) lands and needs richer package-level hook registration. If hook latency becomes a bottleneck under heavy parallel dispatch, promote sync-handler invocation off the dispatcher's hot path. If UpdatedInput proves too error-prone, narrow it to specific tools or remove it. If Claude Code stabilizes additional event names (Elicitation / Setup / etc.) we promote them from reserved-but-stubbed to actually-fired.

ADR 0025 · Headless -p mode + JSON output protocol

  • Status: accepted
  • Date: 2026-05-24
  • Spec: docs/superpowers/specs/2026-05-24-headless-mode-design.md

Context

caliban today only runs as an interactive ratatui TUI. Every potential CI/scripting/devcontainer/GitHub-Actions consumer is blocked on a non-interactive entry point. Claude Code's -p mode with --output-format text|json|stream-json is the documented contract those consumers use; mirroring it engine-to-engine is Tier-1 foundation work. Full spec at docs/superpowers/specs/2026-05-24-headless-mode-design.md; this ADR records the architectural commitments only.

Decision

Headless is a sibling driver, not a fork of the TUI

caliban -p enters a HeadlessDriver that consumes the same AgentBuilder + Stream<Event> surface from caliban-agent-core. The TUI driver is unchanged. Both drivers compose the same hook chain, permission rules, tool registry, and model router — the only difference is the encoder that turns Events into bytes.

Auto-headless when stdin is non-TTY or stdout is piped, unless --no-auto-print is explicit. Explicit --print always wins.

Three output formats, with stream-json as the contract surface

  • text: the assistant's final message body to stdout. The minimum shape. Default.
  • json: a single JSON object identical to the final type: result frame of stream-json. Suitable for jq-driven scripts that only care about the answer + cost.
  • stream-json: NDJSON. First frame is system/init (model, tools, MCP servers, plugins, settings sources); per-turn frames are tool_use, tool_result, content_block_delta (when --include-partial-messages), system/api_retry, user (when --replay-user-messages), hook_event (when --include-hook-events); last frame is type: result.

Stream-json wraps closely around Claude Code's documented shape so downstream consumers can drop in. Divergences (provider-specific token fields, etc.) are documented in the README; we do not commit to byte-identical compatibility because caliban is provider-agnostic.

Tool calls appear in two frames; the message frame is authoritative

Each successful tool call surfaces in the stream-json output as two frames, by design:

{"type":"tool_use","id":"toolu_01ABC","name":"Glob","input":{"pattern":"**/*.toml"}}
{"type":"tool_result","tool_use_id":"toolu_01ABC","is_error":false,"content":[...]}
{"type":"message","role":"assistant","content":[
  {"type":"text","text":"Searching for TOML files…"},
  {"type":"tool_use","id":"toolu_01ABC","name":"Glob","input":{"pattern":"**/*.toml"}}
]}
  1. A top-level short tool_use frame emitted at the moment the model finishes streaming the tool's input JSON (paired with a tool_result frame once the tool completes). This is a progress indicator — useful for live UIs that want to show "Glob is running" before the assistant's final message is assembled.
  2. The same tool_use block embedded inside the subsequent message frame (full assistant message, content-block array) emitted at TurnEnd. This is the authoritative record — the serialized assistant turn as the agent would replay it from a session log.

Operators reconstructing the transcript from the stream should read the message frame and treat the short tool_use/tool_result frames as progress signal. Tools that count tool_use blocks must not double-count (one short frame + one inside message = one tool call, not two).

This mirrors Claude Code, where the assistant message event is the authoritative full content and per-block progress frames are advisory. The duplication is intentional; do not dedupe.

Structured input is also NDJSON

--input-format stream-json makes stdin a chat transcript: each line is either a user message or a control/interrupt frame. The driver feeds the agent one message per turn. EOF gracefully drains.

This makes caliban scriptable from any language that can emit JSON lines, without juggling pseudo-TTYs.

Input frame schema (canonical)

The simple, caliban-canonical shape:

{"type":"user","content":"hello"}
{"type":"user","content":[{"type":"text","text":"hello"}]}
{"type":"control","subtype":"interrupt"}

user.content accepts either a JSON string or an array of content blocks (each {"type":"text","text":"…"}). Both flatten to the same text on the way into the agent.

Unknown type values, malformed JSON, or extra unrecognized fields on user/control frames are hard parse errors (exit 64, EX_USAGE). The driver flushes any in-flight assistant frames first, emits one final result frame with subtype: "error", and only then returns. This is to avoid the failure mode where an operator sends a Claude-Code-shaped envelope ({"type":"user","message":{"role":"user", "content":[...]}}) and the driver silently runs the agent with a blank prompt because serde accepted the unknown message field.

--input-format stream-json requires stdin

When --input-format stream-json is in effect, an explicit prompt is incompatible with the stream-json input path. The binary rejects the combination at clap-parse time with EX_USAGE (exit 64) so operators can't accidentally bypass the frame parser via a positional prompt or --prompt …. The allowed entry points are:

  • No prompt args at all (stdin is read as the NDJSON stream); or
  • -p - / --print - / --prompt - (the - sentinel explicitly delegates to stdin and is treated as a no-op alongside --input-format stream-json).

--bare is opt-in, not the CI default

--bare disables hooks, skills, plugins, MCP, auto-memory, and CLAUDE.md auto-discovery. It's the documented "deterministic CI" mode. Unlike Claude Code's stated direction of making it the default, caliban's headless default keeps inheriting user/project settings — operators must opt out explicitly. Rationale: caliban's first deployments are mostly local-shell automation where inherited settings are useful; CI runners are well-trained to add flags.

Exit codes follow sysexits.h plus two budget signals

CodeMeaning
0success
1generic runtime error
2tool/assistant error
64EX_USAGE (bad flags) / malformed stream-json input
66EX_NOINPUT (--resume <missing>, empty stream-json stdin)
75EX_TEMPFAIL--max-turns exceeded (F12 follow-up: was 130, which collided with 128 + SIGINT)
78EX_CONFIGURATION_ERROR (stdin > 10 MB; settings parse failure)
124cancelled (SIGTERM / Ctrl-C from the agent loop)
130reserved for real SIGINT reaching the harness (128 + 2); the signal handler in caliban/src/main.rs exits with this on a second Ctrl-C
137--max-budget-usd exceeded

CI tooling can distinguish "budget exhausted" from "real failure" without parsing stdout. Update 2026-05-27 (F12): --max-turns exhaustion previously exited 130, which is 128 + SIGINT in the UNIX convention — CI scripts reading $? reasonably concluded the operator had Ctrl-C'd. It now exits 75 (EX_TEMPFAIL), distinct from any signal-derived code. Consumers wanting the structured signal should read the matching result frame's subtype: "max_turns".

Result-frame shape — structured fields for non-success runs

The final result frame's body depends on subtype:

  • subtype: "success" — the assistant's reply lives in the result string field. Token/cost/turn totals are always present. Structured-output payloads are surfaced under structured_output when --json-schema succeeded. This is the load-bearing contract for downstream jq scripts and is not changed by the F7 follow-up below.
  • All non-success subtypes (error, max_turns, budget_exceeded, cancelled) — the result field is omitted; consumers must read the structured fields instead:
    • last_assistant_text — the most recent non-empty assistant text body the agent produced. null (field absent) when the run terminated before any assistant text landed. Distinct from the prior protocol, which set result to the concatenation of every streamed assistant fragment across the truncated run — a value that ranged from a stale plan preamble to literally "" and couldn't be distinguished from a clean answer.
    • tool_calls_seen — running count of ToolCallEnd events observed across the entire run. Lets consumers tell an empty-but-active run (tool loop) from an empty-and-idle one.
    • error — populated for subtype: "error" only; carries the StopCondition::ProviderError / HookDenied / CompactionFailed / Refusal / ContentFilter / schema-validation message verbatim.

Pairs with the exit-code table above: the result frame's subtype and the process exit code agree on what the terminal condition was, so consumers can pick either signal.

Cost accumulator lives in caliban-agent-core::headless

A CostAccumulator (per-(provider, model)) wraps each provider call and accumulates USD against a static pricing table at caliban-agent-core/src/headless/pricing.json. Pricing misses log a WARN and treat cost as zero rather than failing — staleness is real, and we'd rather emit "best-effort, cost may be undercount" than refuse to run. Pricing table refreshes are by-hand PRs against the provider websites; the as_of date surfaces in the system/init frame.

Structured output via --json-schema uses provider-native first, falls back to validate-and-retry

For Anthropic / OpenAI native structured-output: the model router issues the final reply with json_schema semantics, returns the parsed object as structured_output. For providers without native support (Ollama, some Google endpoints): prompt + validate + up-to-2 retries with a "this didn't validate; retry, here's the error" follow-up. After the retry budget, the result frame's subtype is error.

Hook events are observable in headless mode

--include-hook-events attaches an in-process HookSink at the outermost position in the hook chain. Each fired event becomes a hook_event frame, including the router's decision and the permissions layer's verdict separately. Async handlers emit two frames (dispatch + completion) so observability isn't lost behind fire-and-forget. This is the only headless flag that produces zero-cost visibility into the new hook taxonomy (ADR 0024).

Consequences

  • Positive: Closes nearly all rows under "J. Headless / CI" in docs/parity-gap-matrix.md in one PR. Unblocks GitHub Actions and devcontainer integrations (each a separate sub-project, but neither is reachable without this). Makes caliban scriptable from any language. Cost accumulator gives operators (and the eventual /usage slash) a single source of truth for $ spent. Stream-json is the contract surface for everything downstream — once it's stable, we can iterate the TUI without breaking automation consumers.
  • Negative: Pricing table is a maintenance hazard; staleness leads to silent undercounts. Stream-json diverges from Claude Code in per-provider token shapes — exact byte-for-byte parity isn't achievable while remaining provider-agnostic. Bare mode adds another axis of "what was actually configured during this run" that operators must reason about (mitigated by system/init surfacing the source chain). Structured-output fallback retry loop is bounded but adds two extra provider calls in the worst case.
  • Revisit if: Downstream consumers demand byte-for-byte Claude-Code stream-json parity — we'd add a compat translator rather than rework the encoder. If pricing maintenance becomes untenable, host the table behind a hosted JSON file refreshed on a schedule. If --bare semantics need to expand (skipping --system-prompt-file, etc.), promote it to a typed BareModeFlags struct rather than a single bool.

ADR 0026 · Layered settings.json + /config editor

  • Status: accepted
  • Date: 2026-05-24
  • Spec: docs/superpowers/specs/2026-05-24-settings-hierarchy-design.md

Context

caliban today has three ad-hoc TOML files (permissions.toml, mcp.toml, the upcoming hooks.toml per ADR 0024) each loaded by its own crate with no shared scope hierarchy, no schema, no merge rules, no live reload, no interactive editor, and no dynamic auth surface. Claude Code consolidates all of this into one layered settings.json with documented managed > user > project > local merge semantics, a JSON Schema at https://json.schemastore.org/claude-code-settings.json, a tabbed /config editor, and apiKeyHelper for dynamic API-key refresh. Closing that gap is Tier-1 foundation work because plugins (eventual ADR 0030), observability, headless mode, and downstream tooling all want a single configuration story. Full spec at docs/superpowers/specs/2026-05-24-settings-hierarchy-design.md; this ADR records the architectural commitments only.

Decision

JSON is the primary format; TOML is honored at the same path

settings.json is the canonical filename at each scope. The same path with a .toml extension is parsed identically (settings.toml, settings.local.toml). Rationale for JSON-primary: parity with Claude Code's documented schema URL, JSON-Schema editor support out-of-the-box, and serde supports both with no extra work. If both exist in the same scope, JSON wins with a WARN logged.

Four scopes with a documented merge order

In priority order, CLI > Local > Project > User > Managed (default). Managed sits at the bottom by default so operators can augment org defaults, but moves to the top when the managed setting sets parentSettingsBehavior: "block" — mirrors Claude Code's escape hatch. --settings <FILE|JSON> injects a virtual scope above local; --setting-sources <CSV> restricts which scopes are read (e.g. user,project for known-good CI base).

Merge rules: scalars highest-wins, arrays mostly concatenate

Per-key rules are documented in the spec. The headline:

  • Permission arrays (allow/ask/deny), hook arrays (hooks.<Event>), MCP allow/deny lists, available_models, additional_directories, claude_md_excludes all concatenate in priority order with dedup where meaningful.
  • mcp.servers.<name> and env deep-merge.
  • Every other scalar is highest-wins.

The /config Effective tab annotates each value with the scope it came from.

Strongly-typed Settings struct with deny_unknown_fields

Settings is a serde-derived struct in caliban-core::settings. Top- level keys are typed; unknown top-level keys fail loudly. A #[serde(flatten)] extra: BTreeMap<String, Value> escape hatch captures forward-compat keys without forcing a release for every new Claude Code field. JSON Schema is generated from schemars derives at build time and published at https://caliban.dev/schemas/settings.json.

Per-feature TOML files remain a compat fallback for one deprecation window

permissions.toml / mcp.toml / hooks.toml continue to load only when the unified settings.json does not define the matching top-level key. caliban config migrate round-trips them into a single settings.json. After one minor release the compat path logs DEPRECATED; after two it errors.

Live reload via notify + arc-swap + ConfigChange hook

A SettingsWatcher watches each scope's path, debounces 250 ms, re-loads + re-merges, and atomically swaps the Arc<Settings> via arc-swap. A ConfigChange hook event (ADR 0024) fires with the diff so external observers and in-process subscribers can react. Live- reloadable keys are documented (permissions.*, hooks.*, api_key_helper.*, UI keys, env, etc.). Restart-required keys (model, mcp.servers.*, auto_memory_*) log WARN on change and take effect on next launch; /config shows a "restart required" badge.

apiKeyHelper is shell-out with caching + per-provider routing

A configurable script that emits the provider API key on stdout. Two shapes:

  • Single helper with provider: "*" as fallback for all providers.
  • Array of helpers keyed by provider.

Cached refreshIntervalMs (default 5 min) or until a provider returns 401, whichever comes first. Refresh is inline against a slowHelperWarningMs (default 10 s); env var CALIBAN_API_KEY_HELPER_TTL_MS mirrors Claude Code's contract. The helper is execv'd without a shell to avoid argv injection.

Auth precedence chain (per provider): per-provider helper → wildcard helper → env var → keyring → anonymous (local providers).

/config is a tabbed TUI overlay; edits write to project scope by default

Tabs per top-level key group (Model, Permissions, Hooks, MCP, Memory, UI, Auth, Effective). Each row carries a [scope] chip showing which scope contributed the effective value. s cycles the write-scope; w flushes pending edits via atomic temp-file + rename. The Effective tab is read-only and mirrors caliban config print.

The file-watcher picks up /config's own writes automatically, so the running process refreshes via the same code path external edits hit — no extra plumbing.

Consequences

  • Positive: Closes all five rows under "D. Configuration / settings" in docs/parity-gap-matrix.md plus the /config row in section M. Establishes the single configuration story plugins, hooks, MCP, model router, and headless mode all consume. apiKeyHelper unlocks short-lived-credential workflows (AWS STS, GCP IAM, internal vault systems) caliban can't currently participate in. Live reload makes hook/permission iteration cycle- fast.
  • Negative: Settings struct gains ~30 top-level keys — a real surface area to keep typed and tested. Merge rules are intricate (8-row table); operator confusion is real, mitigated by the Effective tab. Live reload introduces "settings changed mid-turn" semantics that subtle bugs can hide in (e.g. a permission allowed at turn start gets revoked mid-turn — we honor the rule at dispatch time, but documenting and testing that boundary takes care). One-release compat window for legacy TOMLs adds short-term parser surface. apiKeyHelper is shell-out; a managed-scope malicious script would be a privesc vector (mitigated by managed paths being root-owned by convention).
  • Revisit if: Settings struct grows beyond ~50 top-level keys (refactor into named sub-modules per group). If live-reload semantics prove too surprising for operators, move to a reload-on-/config-w model. If managed delivery channels (Windows registry, macOS plist) become a real ask, add a ScopeLoader backend per channel. If apiKeyHelper's 5-minute cache proves wrong for short-TTL credentials, expose refreshIntervalMs per provider.

ADR 0027 · TUI ergonomics pack

  • Status: accepted
  • Date: 2026-05-24
  • Spec: docs/superpowers/specs/2026-05-24-tui-ergonomics-design.md

Context

caliban's TUI ships the basics (slash menu, @-attach, mouse-wheel scroll, plan-mode chip, spinner) but six 🔴/🟡 rows under E. TUI ergonomics in docs/parity-gap-matrix.md block day-to-day parity with Claude Code: no shell escape, no external editor handoff, no permission Ask modal (deferred from PR #8), no transcript viewer, no reverse history search, and the @file suggestion path is hard-coded with no operator override.

Each is small in isolation; together they push on the same input-bar state machine and overlay rendering infrastructure. Shipping in one batch lets us refactor InputMode once instead of three times.

The Ask modal in particular has knock-on effects: it's the only piece of UI that blocks the agent loop on user input, so it sets the contract for both auto-mode (ADR 0029) and MCP elicitation (ADR 0023). Landing it here gives both specs a stable target.

Decision

One input bar, many modes

Promote InputMode from {Idle, SlashMenu, AtMenu} to a richer enum that adds ShellEscape, ReverseHistory, ExternalEditor, AskModal, and TranscriptViewer. The first two keep the prompt visible; the last three are modal and short-circuit the main key dispatch. All input-area key handling moves under a single handle_input_key function.

!cmd is a synthesized Bash invocation

A leading ! at column 0 routes the rest of the line into the existing Bash tool via the existing permission hook. That gives us the rule grammar (Bash:git *, Bash:rm *, …) for free and keeps the audit trail consistent. The synthesized call is not added to the conversation history — it's a user action. Plan mode still gates.

External editor is a tempfile roundtrip

Ctrl+G writes the input buffer to a tempfile, leaves the alternate screen, execs $VISUAL/$EDITOR/vi with the path as argv, reads the result back on exit, re-enters the alt-screen. The editor value is whitespace-split verbatim (no shell parsing); EDITOR='code --wait' works.

The Ask modal lives in a new caliban-tui-ask crate

Adding a thin caliban-tui-ask crate keeps caliban-agent-core UI-agnostic. It implements the existing AskHandler trait with an mpsc/oneshot bridge to a ratatui modal supporting four actions — Allow once, Allow + persist project, Allow + persist user, Deny — with in-process re-load of the appended rule.

Transcript viewer renders Message directly

Ctrl+O walks App.messages and renders every ContentBlock variant (text, thinking, tool_use, tool_result, image, redacted) — the model-eye view, distinct from the streaming-friendly TranscriptLine view. [ dumps viewport to scrollback via leave/re-enter alt-screen; v opens the full transcript in $VISUAL.

Reverse history search is scope-cycled

Ctrl+R opens at session scope; Ctrl+S cycles through project and all-projects scopes. Wider scopes lazily memoize from SessionStore in spawn_blocking with a 2s budget.

File suggestion source becomes a trait

FileSuggestionSource with two impls: IgnoreWalkerSource (default, gitignore-aware) and CommandSource (spawns an operator-configured program). Walker stays on the existing ignore crate — no new deps.

Consequences

  • Positive. Six 🔴/🟡 rows move to ✅ in one initiative. The Ask modal unblocks ADR 0029 (auto-mode) and reuses the same overlay primitives that ADR 0023 needs for MCP elicitation. Operators get the keyboard surface expected of any modern agent CLI.
  • Negative. InputMode becomes a fatter enum; handle_event needs careful refactoring to keep existing tests green. One new crate (caliban-tui-ask). Persisting Ask-modal decisions adds a write path into permissions.toml we previously only read from — parse-error and race-with-manual-edit cases need defensive handling.
  • Revisit if: vim mode lands and the InputMode enum needs reshape into (BarMode, EditorMode). The transcript viewer is a natural anchor for /recap and /btw later.
  • Out of scope, enabled by this work: background bash (Ctrl+B), vim mode, image input, voice dictation.

References

  • Spec: docs/superpowers/specs/2026-05-24-tui-ergonomics-design.md
  • Permissions trait: crates/caliban-agent-core/src/permissions.rs
  • Overlay primitives: caliban/src/tui.rs::centered_rect
  • Attach scaffold: caliban/src/tui/attach.rs
  • Companion ADRs: 0028 (Checkpointing — consumes Esc-Esc), 0029 (Auto-mode — consumes the Ask modal), 0023 (MCP v2 — reuses overlay primitives).

Revised 2026-05-26

The original Decision committed the Ask modal to a new caliban-tui-ask crate. In practice the modal shipped at caliban/src/tui/ask.rs (~202 LOC) inside the binary.

Why this is the correct outcome. The modal is binary-coupled (it consumes the binary's App state, dispatches via the binary's Action enum, and renders into the binary's overlay system). Extracting it would require either threading App/Action/overlay traits through a public surface or duplicating them — both costs without payoff. The "extract when sharable" trigger from the original Decision never fired.

Revisit if another consumer needs the modal (e.g., a hypothetical standalone caliban-tui library separated from the binary), or LOC grows past ~500.

ADR 0028 · Checkpointing + /rewind

  • Status: accepted
  • Date: 2026-05-24
  • Spec: docs/superpowers/specs/2026-05-24-checkpointing-design.md

Context

Claude Code's checkpoint + /rewind feature lets operators try an aggressive multi-tool prompt knowing they can undo the file changes without losing the conversation that followed. caliban has neither piece: no per-prompt snapshot, no rewind menu, no Esc-Esc shortcut. The C. Memory & checkpointing section of docs/parity-gap-matrix.md flags this as 🔴 in three rows; M. Slash command coverage flags /rewind as 🔴.

The natural place to wire snapshots is the Hooks trait — it already fires before_tool/after_tool where we need to read pre-images. The natural place to wire restore is the session store — truncating message history is the same shape as a session edit.

Key trade-offs: scope (file-tool edits only, mirroring Claude Code's Bash exclusion — capturing arbitrary subprocess side-effects is intractable); storage layout (mirror Claude Code's ~/.claude/projects/<project_dir_hash>/checkpoints/<session>/ so operators with both tools recognize the shape; override via CALIBAN_CHECKPOINT_ROOT); manifest + content-addressed pre-images over whole-tree cp -a (cheaper, inspectable with ls+cat, cross-prompt dedup is a future hard-link sweep, not v1).

Decision

Two new lifecycle hook events

Hooks gains before_run and after_run with default no-op impls — existing consumers compile unchanged. These are the minimum events checkpointing needs; broader hook-surface parity (Tier 1) will expand the trait further but stays compatible with this addition.

A new Layer-2 crate caliban-checkpoints

Recorder, store, hook impl, and restore logic live in a new crate depending on caliban-agent-core, caliban-provider, caliban-sessions. Keeps agent-core's compile time and dep surface unchanged.

Manifest-based, content-addressed pre-image store

For each touched file, record the pre-image once (keyed by sha256) with metadata in prompt-N/manifest.json and blobs under prompt-N/objects/<sha256>. Newly-created files record with exists_pre: false — restore deletes them. Blob storage (not git, not a database) because operators already trust the filesystem, it's trivially inspectable, and cross-prompt dedup can be added later as a background hard-link sweep without a schema change.

Only Write/Edit/NotebookEdit/(future)MultiEdit trigger recording. Bash, WebFetch, MCP, and external writes are documented out of scope; the rewind menu surfaces this in its footer.

Plan-mode prompts emit empty manifests

Plan mode rejects mutating tools, so manifests come out empty. We still emit prompt-N/manifest.json with kind: "plan" and entries: [] so the prompt is selectable for conversation rewind, keeping cursor positioning sensible across plan/non-plan prompts.

Five restore variants

/rewind menu offers: restore code, restore conversation, restore both (Enter default), summarize from here, summarize up to here. The summarize variants drive the existing SummarizingCompactor on a slice of session.messages — no new summarizer.

Esc-Esc trigger, precedence owned by ADR 0027

When InputMode::Idle and buffer.is_empty(), two Esc presses within 400ms open the rewind menu. Single Esc continues to close modes / cancel turns. The interaction precedence is owned by ADR 0027.

Pruning is tied to session pruning

A checkpoint directory is removed only when cleanupPeriodDays (default 30) has elapsed since its last update and the corresponding session is being pruned by SessionStore::prune. The two operations are coupled so we never orphan checkpoints while the session is still resumable.

CALIBAN_CHECKPOINT_MAX_BYTES (default 5 GiB per project) caps total blob size; on overflow, oldest prompt blobs drop first.

Consequences

  • Positive. Three 🔴 rows move to ✅ in one initiative. The two new hook events are reusable — any future hook-surface work (Tier
    1. inherits the contract. The content-addressed blob layout is small enough to ship in one PR and expressive enough to grow into cross-prompt dedup later. Claude Code parity on the storage path makes a future "migrate Claude Code checkpoints into caliban" tool a one-evening project.
  • Negative. Per-tool disk I/O on the hot path (pre-image read for every Write/Edit). The 16 MiB cap keeps it bounded but at the cost of unrestorable large files. One more workspace crate. Bash mutations remain unobservable — documented but still a footgun.
  • Revisit if: operators demand Bash tracking (could overlay a filesystem-watcher-based recorder, significant complexity); or if storage I/O becomes a bottleneck (could move pre-image reads into a tokio::spawn shadowing the agent loop).
  • Out of scope, enabled here: /fork (branch from checkpoint), cross-machine sync, per-tool-call (not per-prompt) granularity.

References

  • Spec: docs/superpowers/specs/2026-05-24-checkpointing-design.md
  • Hook trait: crates/caliban-agent-core/src/hooks.rs
  • Summarizer: crates/caliban-agent-core/src/compact.rs::SummarizingCompactor
  • Session store: crates/caliban-sessions/src/store.rs
  • Companion ADRs: 0027 (TUI ergonomics — owns Esc-Esc precedence and overlay primitives), 0021 (Sub-agents — will carry /fork later).
  • Parity reference: docs/claude-code-capability-inventory.md §11.

ADR 0029 · Permission modes + auto-mode classifier

  • Status: accepted
  • Date: 2026-05-24
  • Spec: docs/superpowers/specs/2026-05-24-permission-modes-design.md

Context

caliban's permission model is a static rule grammar (permissions.toml) layered on a single plan flag. Claude Code ships six permission modes cycled with Shift+Tab (default/acceptEdits/plan/auto/dontAsk/bypassPermissions), each composing differently with the rule grammar. The marquee piece is auto mode, where a fast classifier model labels each tool call as allow/soft_deny/hard_deny based on workspace/file/network sensitivity rules.

A. Permissions & safety in docs/parity-gap-matrix.md flags this as the headline 🟡/🔴 gap once the OS sandbox is set aside as a separate Tier-4 investment. ADR 0020 (static rule grammar) and ADR 0022 (model router with RequestPurpose::FastClassifier) already shipped. The infrastructure is in place; this ADR connects the pieces.

The classifier model lives in the router, not the permission system — the classifier is just another routed call by purpose. The permission system holds only the orchestration (when to call it, how to cache, how to compose with static rules).

Decision

Permission modes layer over the rule grammar, not under it

The existing PermissionsHook continues to produce Allow/Deny/Ask from static rules. A new ModeFilter wraps that hook and overrides the verdict according to the active mode. Composition order:

ModeFilter(BypassPermissions latched) ─ short-circuit Allow
              │ otherwise
              ▼
        PermissionsHook  → Allow / Deny / Ask
              │
              ▼
        ModeFilter post-pass  may override Ask only

Static Allow/Deny always win — operators trust their TOML. Only Ask is mode-overridable, except bypassPermissions which short-circuits everything (including static Deny) and requires an explicit confirmation flag.

bypassPermissions requires --allow-dangerously-skip-permissions

The only mode that can override static Deny. To enter it, the operator must pass the flag at startup (sets a session-wide latch). Cycling via Shift+Tab into bypass without the latch fires a warning toast and reverts to default. Starting with defaultMode = "bypassPermissions" without the flag aborts startup.

Auto-mode is a classifier consult, cached by input shape

auto only runs the classifier when the rule verdict is Ask. Allow/Deny pass through. A 256-entry LRU keyed on (tool_name, sha256(canonicalized_input)) caches verdicts for the session. The classifier dispatches via RequestPurpose::FastClassifier on the existing router — operators wire Haiku, GPT-4o-mini, a local Ollama model, whatever.

Static rule pre-pass in auto-mode.toml

Before the model call, auto-mode.toml's hard_deny/soft_deny/allow arrays are walked in that order; first match short-circuits with source: StaticRule. The model is the expensive fallback, not the first stop. $defaults.<list> expands to a curated, version-pinned default (sudo, recursive deletion, piped curl, secret-bearing paths, plain-http).

soft_deny falls through to the Ask modal

When the classifier returns soft_deny, the verdict becomes a synthesized Ask request flowing into the same TuiAskHandler (ADR 0027) the static Ask rules use. The classifier's reason string is rendered in the modal. This relies on ADR 0027 being merged first.

A new Layer-3 crate caliban-auto-mode

Classifier, config loader, and curated defaults live in a new crate between caliban-agent-core and the router. The core's permissions module gains only PermissionMode, SharedPermissionMode, and ModeFilter — provider-call-free types.

Sub-agents inherit parent mode by SharedPermissionMode clone (ADR 0021); per-subagent override is v2 follow-up. disableAutoMode = true (or CALIBAN_DISABLE_AUTO_MODE=1) is a hard kill switch — classify always returns SoftDeny { source: DisabledFallback }.

Consequences

  • Positive. Closes two of three remaining 🔴/🟡 rows under Permissions & safety (OS sandbox is deliberately separate). Auto-mode is signature differentiation — caliban's operator-defined classifier model (any provider) is meaningfully more flexible than Claude Code's bundled Haiku. Composition with static rules is auditable and testable in isolation.
  • Negative. One more crate. Hot path gets a network call per Ask (mitigated by cache + static pre-pass). bypassPermissions adds a footgun surface needing UX work (red chip, confirmation toast). The mode enum overlaps with the existing SharedPlanMode flag — we keep both for back-compat at the cost of a small synchronization burden.
  • Revisit if: classifier p95 latency becomes a UX problem (could pre-compute verdicts for likely next-tool shapes); or if curated default lists need more maintenance than the Rust release cadence supports (could pull from a versioned upstream JSON).
  • Out of scope, enabled here: per-subagent permission modes (ADR 0021 v2), /permissions interactive editor, classifier audit log, mode-aware hook events (PermissionRequest/PermissionDenied) once the broader hook surface lands.

References

  • Spec: docs/superpowers/specs/2026-05-24-permission-modes-design.md
  • Static rule layer: crates/caliban-agent-core/src/permissions.rs
  • AskHandler trait: same file (AskHandler, NonInteractiveAskHandler)
  • FastClassifier purpose: ADR 0022
  • Companion ADRs: 0027 (TUI ergonomics — ships Ask modal, must merge first), 0028 (Checkpointing — parallel hook-surface work), 0021 (Sub-agents — v2 refines per-subagent override).
  • Parity reference: docs/claude-code-capability-inventory.md §6, §3.

Revised 2026-05-26

The original Decision committed caliban-auto-mode to be a new Layer-3 crate. In practice the implementation lives inside caliban-agent-core across auto_mode.rs, mode_filter.rs, and permission_mode.rs (~1,750 LOC combined).

Why this is the correct outcome. Auto-mode dispatch is tightly coupled to the permission pipeline (PermissionsHook, SharedPermissionMode, the soft-deny → Ask handshake) which already lives in agent-core. Extracting auto-mode would either pull most of the permission pipeline out with it or introduce a circular dep. The static rule pre-pass, the classifier dispatch, and the LRU cache all live next to the data they need.

Revisit if auto-mode grows a second consumer (e.g., a non-agent classifier client), or if the dispatch path becomes a measurable compile-time burden on caliban-agent-core.

Headless -p defaults — what actually runs

When caliban -p is invoked without --permission-mode, --no-permissions, or any explicit allow/deny/ask flag, the resolved mode is PermissionMode::Default (per resolve_startup_mode in permission_mode.rs). Static rule evaluation still runs: the built-in default-rules tail (default_rules() in permissions.rs) Allows read-only tools (Read, Grep, Glob, TodoWrite, EnterPlanMode/ExitPlanMode), Asks for mutating ones (Write, Edit, Bash, WebFetch), and catch-alls to Ask.

In headless mode, there is no TTY to prompt, so Ask verdicts are routed to NonInteractiveAskHandler (in agent-core's permissions.rs). Its behavior:

  • auto_allow: false (the default) — Ask becomes a hard deny. The tool call fails with a permission error.
  • auto_allow: true (set via --auto-allow / CALIBAN_AUTO_ALLOW) — Ask becomes Allow. Equivalent to running in dontAsk mode for the duration of the run.

The net effect: a tool-using prompt that touches only read-only tools (Read, Glob, Grep) runs to completion silently because each tool hits an explicit Allow. A prompt that needs Write/Edit/Bash without --auto-allow or an explicit --allow/--permission-mode flag will fail on the first such tool call. The lmstudio 2026-05-27 probe (Finding 15) observed the read-only case and reported it as "auto-dispatch without prompting" — that's accurate, but only because Read is on the default Allow list.

--no-permissions is the only way to skip the static rule layer entirely; the resolved mode surfaces in the system/init frame's permission_mode field as the literal string "disabled" to make this state observable (lmstudio Finding 15). All other modes surface under their camelCase name (default, acceptEdits, plan, auto, dontAsk, bypassPermissions).

ADR 0030 · Plugin packaging

  • Status: accepted
  • Date: 2026-05-24
  • Author: john.ford2002@gmail.com
  • Spec: docs/superpowers/specs/2026-05-24-plugin-system-design.md
  • Depends on: ADR for hooks-expansion (forthcoming alongside specs/2026-05-24-hooks-expansion-design.md)

Context

Skills (ADR 0019), MCP servers (ADRs 0017 / 0023), sub-agents (ADR 0021), and the forthcoming hooks-expansion + output-styles work each ship as their own discovery surface. Operators who want to share a package of related customizations — Claude Code's "plugin" model — currently have to drop files into half a dozen directories by hand.

Claude Code unifies all five surfaces under a single plugin directory with one plugin.json manifest; settings expose enabledPlugins, marketplace allowlists, and strictPluginOnlyCustomization; the /plugins slash command and claude plugin CLI manage install / enable / disable / remove. This ADR records caliban's commitment to the same shape.

Decision

A plugin is a directory with a plugin.json manifest

A plugin is <plugin-name>/plugin.json plus optional subdirectories skills/, hooks/, agents/, output-styles/, mcp/, commands/. The manifest declares name (matches directory), version, description, author, license, optional caliban.min_version and caliban.platforms, and a components object pointing at the bundled files. JSON (not TOML) so the surface stays uniform with hooks and MCP configs (also JSON in their canonical forms). Unknown manifest keys are preserved through serde to leave room for forward-compat fields.

Three discovery roots, project > user > managed

  • Project: <workspace>/.caliban/plugins/<name>/
  • User: $XDG_DATA_HOME/caliban/plugins/<name>/
  • Managed: /etc/caliban/plugins/<name>/ (Linux), platform analogues elsewhere. Managed plugins ignore plugins.enabled (policy-enforced).

A plugin with the same name in an earlier root replaces the later one — no manifest merging.

Items are namespaced: <plugin>:<item>

Skills, agents, and output styles loaded from a plugin carry the <plugin>:<item> prefix. They cannot collide with bare-named items at the user level (project-level bare items still shadow them). Hooks merge additively across plugins. MCP servers are exposed under <plugin>:<server> to avoid colliding with user-configured servers.

Collision priority is project > plugin > user. Strict project-only operators get strict_plugin_only_customization = true, which ignores bare-file customizations under ~/.caliban/skills/* etc. entirely.

${CALIBAN_PLUGIN_ROOT} expansion at the plugin boundary

Plugin-bundled MCP configs and hook commands need to reach binaries inside the plugin without hardcoding install paths. caliban-plugins expands ${CALIBAN_PLUGIN_ROOT} to the plugin's absolute root directory before passing config downstream. ${CLAUDE_PLUGIN_ROOT} is an honored alias so existing Claude Code plugins port verbatim. Any other ${VAR} is passed through to the downstream consumer's own expansion (MCP client, hooks loader).

Marketplaces are public JSON indices fetched on demand

A marketplace is one HTTP(S) URL serving a JSON index of plugins + versions + tarball URLs + sha256 hashes. caliban plugin install <name>@<marketplace> fetches the index, verifies the marketplace is in plugins.marketplaces.strict_known and not in blocked, downloads and extracts the tarball, and writes a trust record.

Signature verification is out of scope for v1. Trust is by source URL

  • manifest hash, surfaced in the install prompt. v2 may add cosign / minisign.

Trust gating on first install

Sideloads aren't gated (the operator already had filesystem access). Marketplace installs prompt with plugins.trust_message, the manifest contents, the manifest sha256, and the install URL. Acknowledged installs are recorded in $XDG_DATA_HOME/caliban/trust/plugins.json; re-installs of identical manifest hashes skip the prompt; version bumps re-prompt.

New crate: caliban-plugins

A thin orchestrator: it parses manifests, resolves namespaces, expands ${CALIBAN_PLUGIN_ROOT}, and hands paths + configs to the existing loaders (skills, hooks, MCP, agents, output-styles). It does not duplicate any per-surface logic. The caliban binary constructs one PluginManager at startup and wires its outputs into the existing loaders.

Consequences

  • Positive: Closes Matrix row B "Plugin packages" and the /plugins slash row in one initiative. Existing Claude Code plugins port with at most a directory rename (${CLAUDE_PLUGIN_ROOT} alias). Each downstream loader stays single-purpose — plugins are a composition concern, not a per-loader concern. Trust gating gives operators a real "I have read this" moment without locking sideloads behind ceremony.
  • Negative: Adds a new crate and a new settings surface (plugins.*, plugins.marketplaces.*). Marketplace install adds three new dependencies (tar, flate2, sha2) and an HTTP fetch path separate from MCP's. Trust records create a small migration burden if we ever move the on-disk format (mitigated by versioning the file). The unified hooks taxonomy must land first; this ADR's hooks-merging behavior is a no-op until then.
  • Revisit if: Operators demand signed plugins (move to v2 cosign / minisign verification). The bare-vs-namespaced collision rules surprise users in practice (consider an explicit per-plugin "alias to bare name" affordance). Hot-reload of plugin contents becomes a real need (today it requires restart).

ADR 0031 · Output styles

  • Status: accepted
  • Date: 2026-05-24
  • Author: john.ford2002@gmail.com
  • Spec: docs/superpowers/specs/2026-05-24-output-styles-design.md
  • Depends on: ADR 0018 (memory tier model — splice pattern reused), ADR 0019 (skills — frontmatter parser pattern reused), ADR 0030 (plugin packaging — plugin-supplied styles).

Context

Claude Code exposes four built-in output styles — Default, Proactive, Explanatory, Learning — plus a custom-style file format with frontmatter (name, description, keep-coding-instructions, force-for-plugin). Styles modify the system prompt only; they're orthogonal to permission mode, tools, and hooks. Operators activate via /config → Output style or the outputStyle setting. Caliban currently has none of this surface (matrix row L is 🔴 across the board).

Decision

Output styles are markdown files with frontmatter, like skills

A custom style is a single .md file with a YAML frontmatter block declaring name, description, keep_coding_instructions (bool, default true), and force_for_plugin (bool, default false). The body is the prompt block. The parser reuses serde_yaml (already in the workspace for skills) and mirrors caliban-skills's frontmatter shape.

We use snake_case (keep_coding_instructions, force_for_plugin) internally; the loader accepts kebab-case (keep-coding-instructions, force-for-plugin) as aliases for Claude-Code-format compatibility.

A new crate caliban-output-styles holds it

Modeled on caliban-skills: loader + struct + tool-adjacent pieces. It owns OutputStyle, OutputStylePrefix, default_roots, load_styles, select_active, and the Learning post-processor. Built-in style bodies live as include_str!'d markdown files under crates/caliban-output-styles/src/builtins/.

Discovery roots and shadowing

Same shape as skills: project > user > plugin > built-in. Project styles at <workspace>/.caliban/output-styles/<name>.md shadow user styles at $XDG_CONFIG_HOME/caliban/output-styles/<name>.md, which shadow plugin-supplied styles (which are namespaced <plugin>:<name>), which shadow the four built-ins.

The splice pattern is reused from MemoryPrefix

OutputStylePrefix::splice_into(base) wraps the active style's body in <output-style name="...">…</output-style> and prepends to base. It composes with MemoryPrefix::splice_into: memory tiers go first, then the output-style block, then the base body. The Default style is the no-op — it emits no block at all, so switching to Default produces the exact same prompt as having no style configured. This minimizes prompt-cache invalidation for operators who never customize.

Style activation requires /clear or restart

System prompts are cached by every major provider. Live-swapping the style mid-session would invalidate caches without warning and produce inconsistent assistant behavior. The /config → Output style overlay surfaces a "applies after /clear or restart" hint; the in-memory selection updates, but the system prompt that the provider sees does not change until the next session.

The Learning style is the only style that touches assistant text

Learning instructs the model to emit TODO(human): <prompt> markers on non-trivial decisions; a post-processor (the new AssistantPostProcessor trait in caliban-agent-core) tags those markers in the assistant's output so the TUI can highlight them. Default, Proactive, and Explanatory install an identity post-processor. Tools, hooks, and message contents are unaffected.

force_for_plugin: true lets a plugin pin its style

A plugin-supplied style with force_for_plugin: true overrides the operator's output_style setting while the plugin is enabled. The /config picker shows a "locked by plugin: X" badge. Disabling the plugin releases the lock and the operator's selection returns. Bare (non-plugin) styles with force_for_plugin: true are ignored — only plugin-sourced styles honor the flag.

Consequences

  • Positive: Closes matrix row L (both rows) with a single small-footprint crate that reuses two existing patterns (memory splice + skills frontmatter parse). Plugin-supplied styles fit naturally into the namespacing already proposed in ADR 0030. The keep_coding_instructions: false knob unlocks documentation-/writing-only modes without a separate "agent mode" feature.
  • Negative: Adds a new crate. Frontmatter parsing duplicated between caliban-skills and caliban-output-styles (deferred: factor out a frontmatter helper in caliban-core once a third consumer appears). Prompt-cache invalidation is the operator's responsibility on style switch — surfaced via a hint, but still a papercut. The Learning post-processor adds a small per-turn cost even when the marker scan finds nothing.
  • Revisit if: Operators want streaming-time style mutation (today the post-processor runs after streaming completes). Style composition becomes a real ask (today only one style is active). A community style library justifies bundling a marketplace pointer in defaults.

ADR 0032 · OS-level sandbox

  • Status: accepted
  • Date: 2026-05-24
  • Author: john.ford2002@gmail.com
  • Spec: docs/superpowers/specs/2026-05-24-os-sandbox-design.md
  • Depends on: existing caliban-tools-builtin::BashTool (crates/caliban-tools-builtin/src/shell/bash.rs).

Context

Permission rules (ADR 0020) gate which commands the agent asks about before running. They do nothing once a command is approved. An agent that's been told Bash(*) is allow can rewrite the home directory or exfiltrate via curl with no friction. Claude Code mitigates this with an OS-level sandbox — Seatbelt on macOS, bubblewrap on Linux — that restricts the child process itself. With the sandbox enabled, operators can drop the per-command Ask entirely (autoAllowBashIfSandboxed) because the sandbox is the protection.

Matrix row A "OS-level sandbox" is 🔴 and flagged as a big lift / security-critical. This ADR records the decision to ship it as a shim layer over the existing Bash plumbing.

Decision

Two backends, one config surface

  • macOS: sandbox-exec with a generated .sb (TinyScheme dialect) profile written to $XDG_RUNTIME_DIR/caliban/sandbox/<sessid>.sb. Profile is computed once per session from settings.
  • Linux (and WSL): bwrap with --bind/--ro-bind/--tmpfs flags plus optional --unshare-net and --unshare-user. Argv is computed once per session.
  • Windows native: not supported in v1. Refuses to enable; documents Job Objects + AppContainer as the v2 path.

A single [sandbox] settings block drives both backends. Operators configure intent (allow-write paths, allowed domains, etc.); the backend translates intent into its native policy language.

A new crate caliban-sandbox provides a shim, not a rewrite

caliban-sandbox exposes SandboxedShim::wrap_command(cmd, command_str) which either returns cmd unchanged (sandbox disabled, or command on the unsandboxed allow-list) or wraps it in a new tokio::process::Command whose program is sandbox-exec / bwrap and whose tail is the original command. BashTool::invoke calls wrap_command after building its base Command; everything else — stdout/stderr capture, PID-group cleanup, cancellation, timeouts — stays identical.

This keeps the change tightly scoped: the sandbox is a layer, not a fork of Bash.

auto_allow_bash_if_sandboxed short-circuits the Ask modal

Setting sandbox.enabled: true and sandbox.auto_allow_bash_if_sandboxed: true makes the permission classifier short-circuit Bash(*) to allow before the Ask modal would fire. Rule grammar isn't modified — the short-circuit sits alongside plan-mode-bypass in the permission pipeline. allow_unsandboxed_commands entries (commands that genuinely need unrestricted access) are not auto-allowed; they keep going through the normal rules because they're running unsandboxed.

The auto-allow knob defaults to false; both settings must be set deliberately.

Network egress is sandbox + proxy, not sandbox alone

Neither Seatbelt nor bwrap enforces per-hostname egress reliably on its own. The supported patterns are:

  • allowed_domains = []: deny all egress (--unshare-net / Seatbelt no network-outbound).
  • http_proxy_port = N: deny all egress except 127.0.0.1:N; the operator runs a domain-aware HTTP proxy at that port.
  • Both unset, allowed_domains non-empty on Linux: a warning is logged; the sandbox is less restrictive than the operator probably intended. A v1.1 follow-up ships an in-tree minimal proxy that consumes allowed_domains natively.

macOS Seatbelt supports literal (remote tcp "host:port") allow rules and is correspondingly stricter.

Filesystem ACLs are explicit allow + deny + masks

Bubblewrap masks denied paths with --tmpfs (an empty in-memory directory shadows the real one). Seatbelt uses (deny file-write* (subpath …)). Globs aren't supported in the ACL — operators add explicit roots. ${WORKSPACE}, ${HOME}, and the XDG vars are expanded at session start.

Detection runs at startup; fail_if_unavailable is the gate

SandboxedShim::new detects the backend, version-checks bwrap (>= 0.5), and verifies path. When fail_if_unavailable: true and the backend is missing or too old, caliban refuses to start instead of running unsandboxed.

enable_weaker_nested_sandbox: true is the escape hatch for dev containers (already inside a user namespace; --unshare-user would fail). It drops the offending flags on Linux and is a no-op on macOS.

Consequences

  • Positive: Closes matrix row A "OS-level sandbox" with a minimally-invasive shim. Reuses the existing PID-group cleanup logic (the wrapper inherits the child's group). Unlocks the auto_allow_bash_if_sandboxed UX — Bash becomes a one-keystroke tool when the sandbox is properly configured. Two backends and one config surface means operators move between macOS dev and Linux CI without rewriting policy.
  • Negative: Seatbelt is deprecated by Apple (no replacement ship-date). Bubblewrap requires an external binary (bwrap >= 0.5) that isn't installed by default on every distro. Per-hostname network rules need a proxy to enforce reliably; we don't ship one in v1. Windows isn't supported (deferred). The policy languages are fiddly and undocumented (Seatbelt) or terse (bwrap argv), so debugging operator misconfiguration takes care.
  • Revisit if: Apple removes sandbox-exec (move to Endpoint Security Framework backend). A standard hostname-aware sandbox layer emerges (e.g. systemd-resolved per-process filtering). Demand appears for a Windows backend (Job Objects + AppContainer is the v2 path). Container-based sandboxing becomes the prevailing pattern on Linux (revisit with a Podman / Firejail backend option).

ADR 0033 · OpenTelemetry export + cost tracking

  • Status: accepted
  • Date: 2026-05-24
  • Author: john.ford2002@gmail.com
  • Spec: docs/superpowers/specs/2026-05-24-otel-and-cost-design.md

Context

caliban already has tracing instrumentation under caliban::tools, caliban::cache, caliban::memory, caliban::mcp, caliban::skills, and caliban::timing. What it lacks: (a) a way to ship those signals to an OTLP backend, (b) any concept of dollar cost on completions, (c) operator-visible context-window utilization. Claude Code ships all three and operators depend on them for billing, capacity planning, and right-sizing model choices. We need parity.

The Claude Code env-var contract (CLAUDE_CODE_ENABLE_TELEMETRY, OTEL_*) is well-known and supported by every OTLP backend Anthropic customers run; rather than invent our own knobs we adopt it verbatim with CALIBAN_ substitutions only where required.

Decision

One new crate, caliban-telemetry, owns OTLP + cost + context

It pulls opentelemetry, opentelemetry-otlp, tracing-opentelemetry, serde_yaml, and rust_decimal. caliban-core (agent loop) and caliban (binary / TUI) depend on it. Other crates do not — they emit via the existing tracing macros and tracing-opentelemetry bridges those into OTLP automatically.

Master switch is CALIBAN_ENABLE_TELEMETRY=1

Defaults to 0. When 0, Telemetry::init_from_env returns a no-op shim in ~10 µs and no exporter is constructed. DISABLE_TELEMETRY=1 and DO_NOT_TRACK=1 both force-disable even when CALIBAN_ENABLE_TELEMETRY=1 (privacy belt-and-braces).

OTEL_* env vars adopted verbatim from Claude Code

Endpoint, protocol, headers, exporters, intervals, cardinality knobs, content-control toggles, and mTLS paths — all standard OTel SDK env names. We do not invent caliban-specific names for things OTel already standardizes. The only caliban-prefixed extras are CALIBAN_ENABLE_TELEMETRY (master switch) and CALIBAN_RATES_YAML (rate-card override path).

Cost is observed, not enforced

CostAccumulator records token usage from every provider response, multiplies by RateCard-resolved per-1M-token prices, and exposes totals to /usage plus the caliban.cost.usage metric. Hard caps (--max-budget-usd) live in headless mode, not here. This ADR is purely about visibility; budget enforcement is a downstream concern that consumes the same CostAccumulator.

Rate cards are vendored YAML, updated in lockstep with releases

crates/caliban-telemetry/rates.yaml ships with known rates for Anthropic, OpenAI, Google, Bedrock, Vertex, and Ollama (the last being a $0.00 row for completeness). Unknown (provider, model) pairs match no entry, cost $0.00, and emit a single debounced warning per session. Operators can override via CALIBAN_RATES_YAML=/path. We do not fetch rate cards from any third-party API at runtime — the dependency is one PR-with-a-cron-reminder, not a network call.

USD math uses rust_decimal, never f64

Financial accumulation drifts under f64. We compute in Decimal and convert to f64 only at the OTLP emit boundary (the OTel SDK insists).

Context window is independent of telemetry

ContextWindow is part of caliban-telemetry for code-locality reasons but does not require OTel enabled to work. /usage, /context, and the status-bar percent indicator function for every caliban user regardless of CALIBAN_ENABLE_TELEMETRY. Only OTLP emission is gated.

/compact reuses existing summarization, just adds a slash + metric

RequestPurpose::Summarization already wires through caliban-model-router to a summary-tuned model. The slash command enqueues that purpose at the head of the loop and emits a compact.event log. No new model routing logic is introduced by this ADR.

otel_headers_helper is a per-startup helper script + refresh

Settings field [telemetry].otel_headers_helper points at a path; caliban spawns it at startup and on a configurable interval (telemetry.otel_headers_refresh, default 5m), parses stdout as k=v\n…, merges with OTEL_EXPORTER_OTLP_HEADERS (helper wins on collision). This is how operators put short-lived bearer tokens in front of their collector without checking secrets into env files.

Consequences

  • Positive: Closes six 🔴 rows in the parity matrix under K. Observability / cost (/context, /usage, /compact, Cost tracking, OTLP export, Metric set) in one initiative. Reuses the industry-standard OTEL_* env contract so any existing OTLP backend (Honeycomb, Grafana, Datadog, Tempo, Loki) works out-of-the-box. Decoupling cost/context from OTel emission means the operator-visible features (/usage, status-bar percent) work for everyone — including the airgapped offline case.
  • Negative: Adds ~5 transitive deps via opentelemetry-otlp (tonic, h2, prost, etc.). Vendored rate cards need monthly refresh discipline. rust_decimal is yet another money library; we'll need a brief style note on when to use it. Content-logging knobs are a privacy footgun if operators misconfigure their collector — README must call this out prominently.
  • Revisit if: OTel SDK ships a stable currency / cost convention (currently absent), in which case we align metric attribute names. If rust_decimal proves overkill for the precision we need, swap to fixed-point i64 cents. If operators clamor for runtime rate-card fetching (e.g. integration with their FinOps platform), add a RateCardSource::Url variant.

ADR 0034 · Bedrock + Vertex providers

  • Status: accepted
  • Date: 2026-05-24
  • Author: john.ford2002@gmail.com
  • Spec: docs/superpowers/specs/2026-05-24-bedrock-vertex-providers-design.md

Context

caliban-provider-anthropic already contains feature-gated BedrockTransport and VertexTransport implementations (bedrock and vertex Cargo features), plus the workspace already declares aws-config, aws-sdk-bedrockruntime, aws-smithy-types, and gcp_auth as dependencies in anticipation of this work. What's missing is the top-level Provider-implementing crates that expose these transports as first-class providers with their own name(), their own list_models (which require control-plane APIs the Anthropic crate has no business knowing about), and their own auth refresh policy. Parity with Claude Code's --bedrock / --vertex flags requires both crates.

Decision

Two new crates, both thin wrappers around the existing transports

caliban-provider-bedrock and caliban-provider-vertex each contain ~300 lines of glue:

  1. A Provider-implementing struct wrapping AnthropicProvider<BedrockTransport> or AnthropicProvider<VertexTransport>.
  2. A *Config struct + from_env / from_config constructors.
  3. An AuthRefresh background task.
  4. A list_models that hits the relevant control-plane API (bedrock:ListInferenceProfiles / publishers/anthropic/models), caches the result for the session, and falls back to a vendored list on failure.
  5. A name() returning "bedrock" / "vertex" so the model router and telemetry attribute these correctly.

We do not extend caliban-provider-anthropic to expose Bedrock / Vertex as alternate constructors because (a) it would force the Anthropic crate to depend on aws-sdk-bedrock (control plane) and gain its own non-trivial auth code, and (b) operators have a real mental-model expectation that provider = "bedrock" and provider = "anthropic" are separate provider entries.

Auth refresh is a per-provider tokio task with a 5-minute default

Both crates spawn one background task on construction that calls provider.get_token() (via aws-config's ProvideCredentials or gcp_auth's TokenProvider) on a configurable interval. Settings fields aws_auth_refresh and gcp_auth_refresh (and env CALIBAN_AWS_AUTH_REFRESH / CALIBAN_GCP_AUTH_REFRESH) control the interval; default 5m; 0 disables proactive refresh and relies on inline 401 recovery only. Refresh failures back off exponentially up to the configured interval and surface as tracing::warn! until they succeed; the cached token continues to be served until it expires.

Model-id canonicalization stays in caliban-provider-anthropic

Transport::wire_model_id already lives in the Anthropic crate. The new provider crates expose a small per-base-model release-date table (e.g. ("claude-opus-4-7", "20260423")) consumed by the transport's wire_model_id. The caliban canonical model name (claude-opus-4-7) remains the same across Anthropic / Bedrock / Vertex — only the wire form differs.

Capabilities mirror direct Anthropic per base model

The hyperscalers serve the same Anthropic models with the same context windows, vision support, and tool-use semantics. Until a real discrepancy emerges (e.g. some regions lacking prompt caching), both crates' capabilities() strip the platform suffix and delegate to caliban_provider_anthropic::models::capabilities_for. Any future regional / platform restriction is added as a small subtraction layer on top — not by forking the capabilities table.

list_models is on-demand + per-session-cached, with fallback

We resist the temptation to call list_inference_profiles at provider startup because (a) startup latency is precious and (b) operators with read-restricted IAM principals shouldn't fail startup just because they can't introspect. Both crates call the control-plane API the first time list_models is invoked, cache the result in a tokio::sync::OnceCell, and fall back to a vendored list of well-known models if the API call fails.

Request metadata flows through unchanged

RequestMetadata.purpose, user_id, and any future fields pass through both crates untouched into the transport into the wire body. The provider crates own auth + endpoint + list_models — not request shape.

Consequences

  • Positive: Closes two 🔴 rows under I. Model router & providers (Bedrock, Vertex). Enables operators in regulated industries (financial services, healthcare, gov) to use caliban with their contractual cloud provider. Composes cleanly with caliban-model-router so the same operator can route Sonnet via Bedrock for compliance and Haiku via direct Anthropic for cost. Reuses the Anthropic IR adapter so the message-shape correctness surface stays single-sourced.
  • Negative: Adds two new crates to the workspace; the aws-* dependency tree is heavy (~30 transitive crates, mostly hyper/tower stack). Bedrock model-id rotation (Anthropic occasionally re-dates Bedrock models without changing direct-API names) requires per-base-model date-table maintenance. Two new mock-based test surfaces to maintain.
  • Revisit if: AWS or GCP changes the canonical wire format significantly (e.g. Bedrock unifies under inference-profile ARNs exclusively), in which case the canonical→wire mapping simplifies. If caliban-provider-anthropic's embedded bedrock / vertex features turn out to be confusing duplicate paths, deprecate those feature flags in favor of the new crates and route all hyperscaler-served Anthropic through here.

ADR 0035 · Auto-memory (model-written notes)

  • Status: accepted
  • Date: 2026-05-24
  • Author: john.ford2002@gmail.com
  • Spec: docs/superpowers/specs/2026-05-24-auto-memory-design.md

Context

caliban-memory's third tier (the auto tier, XML tag auto-memory-index) currently bootstraps an empty MEMORY.md with a conventions block and splices it into the prompt — but no machinery exists for writing memory back. Claude Code's auto-memory feature is operator-visible gold: the model accumulates per-project user/feedback/project/reference facts across sessions, and re-loads them as part of the system prompt each turn. Closing this gap is one of the highest-leverage rows in the parity matrix because every other feature (skills, slash commands, hook handlers) gets compounded by long-running memory.

The on-disk layout the user already maintains under ~/.claude/projects/<sanitized-cwd>/memory/ is well-defined: an index MEMORY.md + one markdown file per topic, each with YAML frontmatter declaring name / description / metadata.type. We adopt it verbatim under ~/.caliban/projects/<sanitized-cwd>/memory/.

Decision

Three artifacts together implement auto-memory

  1. Loader extension in caliban-memory — reads MEMORY.md (first 200 lines / 25 KB), strips HTML comments, splices it into the prompt under <auto-memory path="…" topic_count="N">…</auto-memory>.
  2. TopicLoader in caliban-memory::auto — lists / reads / writes / deletes topic files (sibling .md of MEMORY.md); does atomic write + index-line update in a single call so the model can't half- commit.
  3. Built-in auto-memory skill bundled in caliban-skills — its body is the protocol manual (when to read, when to write, the four types, anti-examples). With disable_model_invocation: false, the skill is always available + always loaded into the system prompt.

Two new built-in tools, ReadMemoryTopic and WriteMemoryTopic

We do not reuse Read/Write for memory access because (a) memory paths are sandboxed to the memory dir (path-traversal guard) and (b) writes need to atomically update both the topic file and the index line — that's a single tool call, not two Writes. Both tools live in caliban-tools-builtin under a new memory.* permission category (allowed by default).

MEMORY.md is splice-only; topic files are on-demand

The index is small enough to splice every turn (200 lines / 25 KB cap). Topic files can be hundreds of KB collectively; they're pulled by slug on demand via ReadMemoryTopic. [[slug]] cross-references between topics are informational breadcrumbs — the loader does not auto-follow them.

HTML-comment stripping is done at splice time

<!-- --> blocks in MEMORY.md are stripped from the spliced prompt (but stay on disk). This lets us keep the auto-injected CONVENTIONS_BLOCK HTML-comment-fenced so it doesn't fight with operator-authored content. The strip is greedy (regex), which means a fenced code block containing <!-- --> will lose the comment in the spliced view — documented limitation, low-impact.

Four memory types, model decides at write time

user / feedback / project / reference. The skill body documents heuristics + anti-examples; the model classifies inline. We deliberately avoid a typed classifier service — the model is in the best position to judge what to save, and we don't want a hidden ML layer between the user's intent and the on-disk artifact.

No automatic pruning

Memories persist until manually removed. /memory rm <slug> and /memory rebuild-index cover the manual-curation path. Automatic forgetting is a research problem that we explicitly punt on.

CALIBAN_DISABLE_AUTO_MEMORY=1 is both a privacy kill switch

and a determinism switch for CI

When set, no <auto-memory> block is spliced and the auto-memory skill is dropped from the system prompt. This guarantees that headless runs and CI workflows produce identical prompts regardless of on-disk memory state.

The on-disk format is the source of truth

We do not invent a database or sqlite layer. Markdown + YAML frontmatter is human-readable, git-friendly, and aligns with how operators already mentally model CLAUDE.md. The trade-off — file locking concurrency, parsing overhead — is acceptable at the scales auto-memory actually sees (tens of topic files, kilobytes each).

Atomic writes via tempfile + rename

WriteMemoryTopic writes to <slug>.md.tmp then renames; index-line update is part of the same operation. Failure mid-write leaves the prior content intact. Failure between topic-write and index-update leaves an orphan topic file — rebuild-index repairs it.

Consequences

  • Positive: Closes a tier-5-priority row in the parity matrix that compounds the value of every long-running session. Operators get Claude Code's "wow it remembered" UX out of the box. The on-disk format means operators can manually curate memory with their favorite text editor. Composes with skills (the protocol is a skill) so the system documents itself.
  • Negative: Two new built-in tools to maintain + a new permission category. The auto-memory skill body is a maintenance surface (15 CI test asserts it doesn't drift). HTML-comment stripping is a hidden behavior that may surprise operators. No automatic pruning means MEMORY.md grows unbounded on long-running projects — operator hygiene is required.
  • Revisit if: The 200-line / 25 KB cap turns out to be too small in practice (operators routinely brush against the truncation warning); a richer indexer that summarizes topic files into the splice may be needed. If concurrent writes from background subagents prove racy, add file locks (fs2::FileExt::try_lock_exclusive). If the markdown+frontmatter parsing overhead shows up in startup profiles, add a per-topic cache keyed by mtime.

ADR 0036 · CLAUDE.md ancestor walk + @-imports

  • Status: accepted
  • Date: 2026-05-24
  • Author: john.ford2002@gmail.com
  • Spec: docs/superpowers/specs/2026-05-24-claudemd-ancestry-design.md

Context

caliban-memory's project tier currently loads exactly one file — <workspace_root>/CLAUDE.md. Claude Code instead walks from cwd upward, concatenating every CLAUDE.md (and AGENTS.md and .caliban.md) it finds, supports @path/to/file imports inside any of them (bounded recursion + approval for external paths), loads nested children on demand as the model reads into subdirectories, and honors .claude/rules/<topic>.md files with optional paths: glob frontmatter for scoped activation. The matrix marks this row 🟡 because the single-file loader exists but lacks every other behavior.

We need parity to make caliban usable in monorepos, in deeply-nested project layouts, and in any workflow where contributors share CLAUDE.md fragments via imports.

Decision

Five behaviors, one orchestrator

The new project tier in caliban-memory orchestrates five distinct concerns:

  1. Ancestor walk — start at cwd, walk up to git root (or fs root, configurable via WalkStop), concatenate every CLAUDE.md / AGENTS.md / .caliban.md in broad → narrow order.
  2. @-imports — recursion-bounded (depth ≤5), cycle-detected by canonical path, with an approval dialog for first-time external imports persisted to ~/.caliban/imports-allowlist.json.
  3. Nested-on-demandRead/Edit/Glob success notifies an AncestryAddendum which appends any newly-touched directory's CLAUDE.md to the system prompt for the rest of the session.
  4. .caliban/rules/<topic>.md — path-glob-scoped rules with a RulesActivator that lights them up on first matching path touch.
  5. claude_md_excludes — gitignore-style patterns scoped to the workspace root, evaluated during walk.

All five share the existing MemoryPrefix machinery; project slot becomes a richer ProjectTier struct containing four Vec<TierFile> collections (base / imports / rules / nested) instead of one TierFile.

Three filenames, no precedence battles

CLAUDE.md, AGENTS.md, and .caliban.md are all loaded when present in the same directory. Within a directory we load .caliban.mdCLAUDE.mdAGENTS.md (most-specific → most-general). We do not surface "which file overrode which" because they don't override — they concatenate. Operators who need exclusion use claude_md_excludes.

@-import semantics align with Claude Code, minus HTTP

Local paths only. @./foo.md, @~/notes/x.md, @/abs/path.md all work; @http(s)://… is rejected outright. This keeps imports auditable (a static set of filesystem paths) and avoids embedding an HTTP fetcher inside the prompt-assembly path.

External imports (those outside the workspace root and outside ~/.config/caliban/) require approval. The dialog persists decisions; non-interactive callers (--print, CI, --bare) deny by default but respect CALIBAN_APPROVE_IMPORTS=1 for unattended runs.

Nested-on-demand is one-shot per (path, session)

Once the model Reads a file and we load that directory's CLAUDE.md, we keep it for the rest of the session. We do not detect file changes and reload, we do not unload when the model leaves the subtree. This keeps the system prompt monotone (only grows), which matches how operators reason about it.

Rules use globset, the workspace's existing glob crate

globset is already a workspace dep. Rules build a single GlobSet at startup; path-touch hooks ask "does this path match any unactivated rule?" — O(1). Rules without a paths: frontmatter are always-active (loaded at startup, before any path touch).

claude_md_excludes is gitignore-style with explicit semantics

We adopt the gitignore matching semantics (! negation, last-match wins for a given path). Patterns are evaluated relative to the workspace root, not to the absolute filesystem path — operators write node_modules/**, not /Users/foo/proj/node_modules/**. The workspace root is the start of the ancestor walk (the cwd at startup).

--add-dir paths contribute CLAUDE.md only opt-in

Adding a directory to the agent's accessible-paths set should not silently inject another CLAUDE.md into the prompt. Operators who want that behavior set CALIBAN_ADDITIONAL_DIRECTORIES_CLAUDE_MD=1. Each --add-dir then performs its own ancestor walk, concatenated after the cwd walk in declaration order.

Regression escape: CALIBAN_DISABLE_CLAUDE_MD_WALK=1

If the new loader misbehaves in a real-world repo we don't have CI coverage for, operators set this env to fall back to the legacy single-file project tier. This is a maintenance lifeline; we expect it to be unused in steady state.

Consequences

  • Positive: Closes three 🟡 / 🔴 rows under C. Memory & checkpointing in one PR. Caliban becomes deployable in monorepos without prompt-injection workarounds. @-imports unlock content sharing between repos (a single ~/notes/api-conventions.md can be imported from every project's CLAUDE.md). Rules let language/framework-specific guidance be scoped to where it applies instead of polluting the top-level CLAUDE.md.
  • Negative: Project-tier complexity goes up materially — five concerns sharing one orchestrator. The approval-dialog UX adds a new modal flow the TUI must handle. The system prompt grows monotonically during a session, which interacts with the existing memory budget enforcement (truncation logic now runs against a larger surface). Operator authoring of claude_md_excludes gitignore patterns is a known footgun (test #18 covers the common case).
  • Revisit if: A real-world repo demands HTTP imports — we'd revisit the security model (signed manifests? lockfile?). If the approval dialog frequency proves annoying in practice, add [memory] auto_approve_under = ["~/dev/personal/**"]. If the monotone-prompt-growth interacts badly with long sessions, add a rule-level "deactivate after N turns since last match" knob.

ADR 0037 · Sub-agent worktree isolation + background fleet

  • Status: accepted
  • Date: 2026-05-24
  • Spec: docs/superpowers/specs/2026-05-24-subagent-worktree-and-fleet-design.md
  • Builds on: ADR 0021 (sub-agent primitive), ADR 0024 (hook taxonomy)
  • Author: john.ford2002@gmail.com

Context

ADR 0021 shipped AgentTool as an in-process, foreground, recursion- guarded primitive. That covers the simple "spawn a read-only Grep/Read subagent and inline its summary" use case Claude Code uses for parallel research. It does not cover:

  • Filesystem isolation — a sub-agent that writes files shares the parent's working tree, so Edit/Write side-effects mix into the parent's diff and there is no clean way to discard them.
  • Long-running detached work — the parent's turn budget is the sub-agent's wall-clock budget; nothing survives the parent run ending. Claude Code's --bg, claude agents list / attach / respawn / rm surface a fleet of detachable sub-agents we have no equivalent for.
  • Hook inheritance — deferred from PR #9 / ADR 0024. Child sub-agents currently get a brand-new hook stack; flow-scoped hooks the parent set up are silently dropped.

These three concerns share state (the spawn site, the lifecycle ownership, the working-directory model) and want to be solved together. This ADR records the architectural commitments; mechanics live in the design spec.

Decision

Isolation is opt-in per sub-agent, via frontmatter or call-site

Two modes only — none (today's behavior, default) and worktree. A worktree sub-agent runs in a dedicated git worktree materialized under .caliban/worktrees/<name> with a configurable base_ref (fresh / head / named ref), optional sparse_paths, and optional symlink_directories (so heavy build outputs like target/ and node_modules/ are shared by symlink instead of duplicated).

We pick git-worktree over copy-on-write filesystems or chroots because it works everywhere git works, it is a primitive the user already understands, and it composes with the rest of git (a sub-agent's diff is a real branch tip the user can inspect). Containers and OS sandboxes are orthogonal layers that can wrap a worktree later.

Background sub-agents are owned by a new caliban-supervisor daemon

bg = true (frontmatter or runtime override) detaches the sub-agent from its caller. The detached agent's lifecycle is managed by a per-repo daemon (caliband) auto-spawned on first need. The daemon owns a control Unix socket (list/attach/kill/respawn/rm/spawn/status) and exposes a per-agent socket each sub-agent writes its TurnEvent stream to.

We pick a separate daemon process — not a tokio task inside the main CLI — because (a) the parent CLI process should be free to exit and let background sub-agents keep running, and (b) it cleanly separates short-lived foreground concerns from long-lived fleet concerns. We pick a Unix domain socket over TCP because the fleet is local-only by design; TCP exposure waits for a remote-orchestration ADR.

Per-agent on-disk store is caliban-sessions-compatible

A background sub-agent's <base>/agents/<id>/session.json is a regular caliban session file. caliban agents attach <id> is sugar for caliban resume <id> over the agent's socket. Reusing the format means session tooling (compaction, replay, audit) works on background sub- agents for free.

Ctrl+B is a runtime transition, not a new spawn

A foreground sub-agent can be backgrounded mid-run by snapshotting its state and transferring ownership to the supervisor. The parent's in-flight AgentTool::invoke future is cancelled with a ToolError::Backgrounded(id) and the assistant transcript records the handoff. The sub-agent itself sees no state change — it continues from the next event. This is the operator's escape hatch for "this is taking longer than I thought; let me get my main loop back."

Hook inheritance defaults to true, with an explicit opt-out

Closes the deferred follow-up from ADR 0024 PR #9. Children inherit the parent's Hooks chain by default; inherit_hooks: false in frontmatter resets to the binary's default chain. For background sub- agents, only the serializable portion of the parent chain (HookRouter config + identified in-process hooks) crosses the process boundary; opaque closures are stripped with a loud warning. This trades some correctness for a tractable contract — operators who want full inheritance keep their background sub-agents foreground until their hooks are config-expressible.

Worktree cleanup defaults to true

Foreground worktrees are removed when the sub-agent's WorktreeHandle drops. CALIBAN_KEEP_WORKTREES=1 (and per-call keep_on_exit: true) disable removal for debugging. Background worktrees are owned by the supervisor and removed on caliban agents rm <id> (and on daemon startup, for orphans, when configured). This is deliberately aggressive: worktrees are cheap to recreate and expensive to leak.

Consequences

  • Positive. Closes four 🔴 rows under matrix G — worktree isolation, background sub-agents, subagent-local memory dir, hook inheritance — and adds the supervisor daemon row as a new ✅. Unblocks the "long-running code-review subagent" and "parallel exploratory refactor" workflows that Claude Code uses heavily. Establishes the daemon substrate other features can borrow (notably a future caliban serve HTTP shim for headless use).
  • Negative. Two new crates and a new binary (caliband). The per-repo daemon model means cross-repo agent management requires multiple daemons; we accept this for v1. Hook inheritance for background sub-agents is partial by design (closure hooks dropped). Disk usage grows with sparse + symlink-shared worktrees, but the default fresh-empty base_ref keeps the floor low. Windows symlink requirements (elevation / dev mode) make worktree isolation a best-effort feature there.
  • Revisit if: Disk pressure from worktrees becomes a recurring operator complaint — promote a "shared object store" layout (git worktree --no-checkout + targeted materialization). If background-agent IPC outgrows length-prefixed bincode, swap to gRPC over the same socket. If the no-closure-hook-inheritance compromise for background mode bites real users, sketch a serializable-hook IR.

ADR 0038 · Model router v2 — fallback, hedging, breakers, capabilities, binary wiring

  • Status: accepted
  • Date: 2026-05-24
  • Spec: docs/superpowers/specs/2026-05-24-model-router-v2-design.md
  • Supersedes scope of: ADR 0022 deferred items
  • Author: john.ford2002@gmail.com

Context

ADR 0022 + PR #12 shipped the model router as a config-driven dispatcher: TOML schema, builder API, purpose-keyed routes, impl Provider, per- route usage tracking. Five capabilities were deferred to v2 because the v1 surface needed to settle before resilience landed on top:

  • Fallback chains — try the next route on a fatal-for-route error.
  • Hedged requests — race a second route after a delay; first wins.
  • Circuit breakers — skip a failing route for a cool-off window.
  • Capability-based pre-routing — auto-route requests that need vision/thinking/parallel-tools to capable models even if the operator put a non-capable route first.
  • caliban.toml discovery + binary wiring — the CLI does not yet construct a ModelRouter from [router]; it falls back to single- provider construction.

Plus the smaller effort and per-route prompt-cache normalization follow-ups. Closing all six in one ADR keeps the router's contract coherent — fallback, hedging, and breakers all consume the same candidate-vec from resolution, and capability filtering changes which candidates appear in the first place.

Decision

Resolution and dispatch are separated, with a candidate vec as the seam

resolve_candidates(...) -> Vec<&RouteEntry> is the single funnel into the dispatch driver. Filters apply in order: purpose → declared requires → request-derived needs → breaker state → explicit fallback re-ordering. Dispatch (fallback or hedging) consumes the vec identically. This means the same diagnostic (/router debug) shows the exact list every dispatch will see, and new filters (e.g. cost-budget) slot in without touching dispatch.

Fallback is sequential by default; hedging is opt-in per route

Sequential fallback handles the cost-conscious common case: try the primary, only spend on the secondary on real failure. Hedging is a spend-for-latency knob the operator opts into per route via hedge = { hedge_after_ms = N, max = K }. We pick this default because hedging silently doubles the bill for the median request; making it opt- in keeps the surprise floor low.

Fatal-for-route is a closed list

ModelUnavailable, RateLimit (post adapter-retry), ContextTooLong, ServerError, NetworkTimeout → fall back. Everything else (Auth, InvalidRequest, ContentPolicy, Cancelled) propagates. The list lives in code (fallback.rs::is_fatal_for_route); tests pin the membership.

Circuit breaker is per-route id, not per (provider, model)

The breaker's state lives in BreakerRegistry: HashMap<RouteId, ArcSwap<BreakerState>>. We key on the route id (which defaults to {provider}:{model}:{purpose}) so the operator can break a provider on one purpose without disabling it on another. Closed → Tripped → HalfOpen → Closed/Tripped is the standard SRE breaker. Cancelled outcomes do not count toward failure.

Capability filtering is pre-routing, not post-failure

Today the router relies on requires blocks to drop incompatible routes. v2 adds request-derived needs (image content → vision; thinking budget → thinking capability) so the operator does not need to mark every route explicitly. This costs one Provider::capabilities(model) call per candidate (already a HashMap lookup in the adapters); we accept the cost because the diagnostic value is large.

caliban.toml discovery uses the CLAUDE.md walk algorithm

Same ancestor-walk-up-to-git-root-or-$HOME as memory tier 0018, with a different filename predicate. Both walks share a caliban-memory::walk_up utility (already small, factored out for this ADR). Layering: CLI flag > env var > caliban.toml > $HOME/.config/caliban/caliban.toml. Unknown providers fail loudly at startup, not lazily on first call.

Effort levels live on RequestMetadata and map per-adapter

RequestMetadata.effort: Option<EffortLevel> is plumbed through to each adapter. Each adapter owns the mapping to its native effort knob (reasoning_effort / extended_thinking.budget / thinkingConfig). Ollama's mapping is a no-op for now. Operators see the table via caliban router debug --effort-table.

Prompt-cache markers are cleared on cross-route hops

When fallback or hedging moves to a different provider mid-session, cache_control markers in the persisted messages are stripped before the new adapter sees them. The cleared count is recorded in router.cache.markers_cleared. This is the cheap, safe behavior; markers are normalization-cost, not correctness-cost.

Metrics are tracing first, OTel-export later

We emit tracing events with structured fields (route_id, purpose, kind, from, to); the OTel cost spec (out of scope for this ADR) maps them to OTLP metric streams. Keeping the in-router emission tracing-only avoids pulling opentelemetry into a Layer-3 crate.

Consequences

  • Positive. Closes six 🔴 rows under matrix I in one PR — fallback, hedging, breakers, capability filtering, caliban.toml wiring, effort levels. The router now earns its keep as a resilience layer: a flaky primary auto-routes to a secondary, a tripped breaker prevents cascade failure, hedging gives operators an explicit spend/latency knob. The binary actually constructs a router from config, removing the awkward "config exists but unwired" state.
  • Negative. Hedging spend can surprise operators who do not read the README. We mitigate with explicit-opt-in and loud per-route hedge_loss metrics, but it remains a footgun. Breaker false positives are real and the cool-off window is fixed (no exponential back-off in v2). Capability auto-routing changes which route a request lands on without the operator's purpose knob; this can be debugged via /router debug but is a behavior change v1 users may not expect — release notes call it out. Prompt-cache marker clearing means cross-route hops lose Anthropic cache savings; with hedging this happens silently on every hedge to a non-Anthropic fallback.
  • Revisit if: Operator demand for adaptive hedge tuning (EWMA, p95 observation) materializes — a v3 sketch already lives in the spec's non-goals. If breaker false-positive complaints recur, add exponential cool-off (cooldown_secs * 2^trip_count up to a cap). If the candidate-vec seam ossifies and we need cost/budget routing, introduce a Budget filter stage before dispatch rather than rewriting dispatch.

ADR 0039 · Image + vision input

  • Status: accepted
  • Date: 2026-05-24
  • Spec: docs/superpowers/specs/2026-05-24-image-input-design.md
  • Author: john.ford2002@gmail.com

Context

Vision is table-stakes for any modern coding assistant: users want to paste a screenshot of a stack trace, drop a Figma export, or @path a generated chart and get it through to a vision-capable model. Caliban's current ContentBlock IR is text-only; the TUI input layer has no paste handler beyond text; the provider adapters serialize text content exclusively. Closing this requires changes across five crates plus a new ingest crate, but each change is small and the design carries no provider lock-in. Capability filtering (model-router v2) already contemplates a vision predicate — this ADR makes it real.

Decision

ContentBlock IR gains an Image variant; ImageBlock is provider-agnostic

The IR carries { source, mime, sha256, dims, cache_control }. source is Base64 { data } or Url { url }. Provider adapters own the serialization to their native shape (Anthropic image, OpenAI image_url, Google inline_data). This keeps the IR free of provider- specific knobs and lets us add a new provider's image shape by writing exactly the adapter, with no IR churn.

Ingest is a separate crate, caliban-images

Clipboard reads, DnD escape parsing, MIME sniffing, decode validation, size cap enforcement, downscale, and SHA-256 fingerprinting all live in one crate. The TUI and CLI both depend on it; the model-router pulls nothing from it (router only sees the already-built ImageBlock IR). Crate separation matters because the image decoder family is the biggest CVE surface in the dependency tree — keeping it behind a single boundary makes audits and feature-gating tractable.

MIME allowlist is closed: png, jpeg, gif, webp

We explicitly disable bmp/tiff/dds/tga and friends in the image crate feature flags. AVIF/HEIC are tracked but not v1. The list mirrors what all three vision providers (Anthropic, OpenAI, Google) support; expanding later is a config-flag change, not an API change.

Default size cap: 5 MiB pre-base64, downscale to 1568 px on longest edge

5 MiB matches Anthropic's documented limit; 1568 px matches Anthropic's recommended longest edge for cost-efficient inputs. Over-cap images are downscaled (Lanczos3) with a WARN-level trace and a "[downscaled]" badge on the TUI thumbnail. Operators can override via [images] in caliban.toml.

Capability filtering is mandatory; CALIBAN_STRICT_ROUTING=false opts out

By default, an image-bearing request that has no vision-capable route fails with RouterError::NoCandidate. Operators who want degraded behavior (CI, headless) set CALIBAN_STRICT_ROUTING=false; the router replaces image content with a documented text placeholder and continues. We pick "strict by default" because silent vision drop is a worse failure mode than a clear error pointing at the missing route.

Sessions store images as blob refs, never as inline base64

session.json carries ImageSource::BlobRef { sha256 }; the actual bytes live in <session>/blobs/<sha>.bin. BlobRef has #[serde(skip_…)] guarding against accidental wire serialization. This keeps session files small, makes git history of .caliban/sessions readable, and sets up the future session gc command.

TUI graphics protocol is detected once per session, with a text fallback

We probe kitty/sixel/iTerm2 capability via short escape sequences with a 100ms timeout, cache the result, and fall through to a [image: WxH MIME filename] placeholder otherwise. Probe results are overridable via CALIBAN_GRAPHICS=kitty|sixel|iterm|none. Probes hang on some terminals; the timeout is the safety valve.

Cost accounting reads provider Usage, not local estimates, but estimates surface in /usage for diagnostics

Anthropic and OpenAI return token usage including image tokens. We bill from what providers report. Locally we also compute a labelled estimate (ceil(w * h / 750) for Anthropic-style billing) so the /usage overlay can answer "what's this image roughly worth" before the call returns.

Consequences

  • Positive. Closes matrix E "image / vision input" with one PR. The IR change is a small additive variant on ContentBlock; existing handlers are unaffected (default-match arm). Capability filtering in the router (already designed in v2) gets its first real consumer. Pasting a screenshot into caliban "just works" with the right route configured. The caliban-images crate establishes a pattern for future media types (PDF, audio).
  • Negative. Five crates touched. image crate dependency adds decoder CVE surface; we constrain it but cannot eliminate it. The TUI gains a graphics-protocol detection path that has been a recurring source of bugs in other tools — we mitigate with caching + override but accept some carry. Cost surprise is real for large screenshots; the 1568 px downscale default helps but does not eliminate it. Strict-by-default routing will trip operators who configured a non- vision route as their default — clear error message + docs are the mitigation. Session blob storage adds a directory layout we must GC eventually.
  • Revisit if: Output-side vision (image generation) becomes a real capability across providers — extend the IR with ImageGeneration / similar. If the image crate accumulates serious CVEs, sandbox the ingest path in a separate process. If users routinely hit the per-message count cap (20), expose it directly in the TUI rather than via caliban.toml.

ADR 0040 · Slash command registry

  • Status: accepted
  • Date: 2026-05-24
  • Spec: docs/superpowers/specs/2026-05-24-slash-command-coverage-design.md

Context

caliban currently has four hard-coded slash commands (/plan, /memory, /skills, /quit) dispatched from a match in Tui::handle_slash_command. Closing the parity gap with Claude Code adds another ~24 commands at minimum, plus plugin-supplied commands. Continuing the match arm pattern is untenable: it forces every command into one file, prevents plugins from registering commands, and duplicates the typeahead suggester data.

Decision

A SlashCommand trait + central SlashCommandRegistry

Each slash command becomes its own impl SlashCommand in caliban/src/tui/slash/<group>.rs. The registry holds them by name in a HashMap<&'static str, Arc<dyn SlashCommand>> and exposes register, suggest, dispatch. The TUI's input bar consults the suggester for typeahead; the dispatcher routes execution.

A shared SlashCtx<'a> is passed to every command

Commands need mutable access to the running session and immutable references to long-lived registries (providers, router, MCP manager, skills, hooks, sub-agent fleet, settings). Threading each separately into every command would mean re-plumbing nine call sites every time a new shared resource is added. Instead, SlashCtx is a single borrowing struct constructed per command dispatch. Commands take &mut SlashCtx<'_> and reach in for what they need.

The risk is SlashCtx becoming a god-object. We accept that risk and commit to splitting it if it grows past ~20 fields.

Slash commands are operator UI, not model tools — no permission gating

Slash commands run as the operator's direct action; they are not gated by the permission rule grammar that protects model-initiated tool calls. Commands that wrap destructive operations (/clear, /rewind-restore, /logout) implement their own interactive confirmation in their overlay. This keeps the rule grammar focused on its actual job (constraining the model) and removes a layer of ambiguity ("did /clear get rejected by a Bash rule?").

Hooks fire on slash submission

UserPromptSubmit (from ADR 0024 / Hooks expansion) fires before the slash parser runs. Hook payload includes is_slash: bool, command: str, args: str. A hook can reject or modify the slash command — useful for audit logging or per-operator policy.

Stubs are first-class

Several slash commands depend on machinery being designed in sibling specs (settings, MCP v2, plugins, OTel/cost, checkpointing). Rather than wait for everything to land, we register stubs that emit a helpful status message ("cost tracking lands in PR #N — see docs/superpowers/specs/2026-05-24-otel-and-cost-design.md"). The stub files name the in-flight spec so the user can tell what's coming.

Consequences

  • Positive: Clean extension point — adding a command is one file and one registry.register(...) line. Plugins (per ADR 0030) register commands the same way. Typeahead works automatically for every registered command. /help enumerates the live set, so documentation never drifts from reality.
  • Negative: SlashCtx is wide. Stubs can confuse operators if the message isn't clear. Plugin-supplied commands shadowing built-ins need consistent semantics (plugin loses); logged at registration time. Adds ~150 LOC of trait/registry plumbing for ~24 small command impls.
  • Revisit if: Commands begin to need session-specific command registration (e.g. a sub-agent's command appears only when that sub-agent is attached). Today's registry is process-global; a per-session overlay can be added without breaking the trait.

ADR 0041 · TUI redraw tick close-out

  • Status: accepted
  • Date: 2026-05-26
  • Supersedes: portions of 0014 (the "If the underlying cause is something deeper" open question)

Context

ADR 0014 introduced a 50 ms redraw tick into the TUI event loop (caliban/src/tui.rs:180) as a workaround for stalls observed during streaming completions. The same ADR explicitly acknowledged the tick "masks the symptom rather than addressing the root cause" and pointed at a probable missing-waker bug in async_stream::try_stream! as the likely culprit.

Two years on, no follow-up ADR had closed the question — this ADR does.

Decision

The 50 ms redraw tick stays.

The reasoning:

  1. No reported regressions in 18 months of regular use. The tick has been in place since the original ADR 0014 commit; no stall reports have surfaced since.
  2. Modern async-stream 0.3 has sound waker propagation. The original 2024 hypothesis (async_stream::try_stream! failing to register a waker) is unlikely with the current dep. A static read of the TurnEventStream construction (crates/caliban-agent-core/src/stream/mod.rs:263) found no obvious waker bugs.
  3. The tick's cost is negligible. A no-op wake every 50 ms is ~10 µs of CPU per second = 0.02 % overhead on a single core. The ratatui frame-render path early-returns when state is unchanged (the toast-drop check above the draw call is the only state mutation per tick).
  4. Removing the tick would risk a silent regression for a marginal cleanup gain. The tick is a one-line defensive fallback that costs nothing observable.

Consequences

  • The tick remains in caliban/src/tui.rs.
  • ADR 0014's "If the underlying cause is something deeper" open question is now considered closed.
  • The mention of the tick in ADR 0014 is left as-is for historical context; this ADR is the authoritative current decision.

Revisit if

  • A contributor identifies a reproducible stall under specific conditions (a particular provider, model, or prompt shape).
  • A future async-stream / ratatui / tokio upgrade reintroduces the symptom.
  • A measurable battery-life or thermal regression is attributed to the redraw tick on long-running TUI sessions.

In any of those cases the appropriate response is to re-run the investigation with the debug log enabled (see ADR 0014 §"Debug log"), identify the root cause, and either land a real fix or write a new ADR with the updated reasoning.

References

  • ADR 0014 (original tick decision; §"Stall fix").
  • TUI event loop: caliban/src/tui.rs:180 (interval declaration), caliban/src/tui.rs:241 (tick arm of the select).
  • TurnEventStream construction: crates/caliban-agent-core/src/stream/mod.rs:263 (try_stream! macro invocation).
  • Workspace dep: async-stream = "0.3" (root Cargo.toml).

ADR 0042 · caliband sibling-binary placement

  • Status: accepted
  • Date: 2026-05-26

Context

The workspace declares two binaries:

  • caliban — the primary user-facing TUI/CLI. Source at the workspace root (caliban/src/main.rs).
  • caliband — the supervisor daemon (ADR 0037). Source nested under its owning crate at crates/caliban-supervisor/src/bin/caliband.rs, declared via the [[bin]] entry in crates/caliban-supervisor/Cargo.toml.

ADR 0005 ("Workspace layout") establishes the convention that "primary" binaries live at the workspace root. caliband does not — it lives nested under its owning crate. ADR 0037 introduces the daemon obliquely (its name, its on-disk paths, and its protocol) but does not document the placement choice. This ADR records it.

Decision

caliband stays nested under caliban-supervisor as a secondary binary, with its [[bin]] declaration in the supervisor crate's Cargo.toml.

Consequences

  • Clean process boundary between the user-facing caliban CLI/TUI and the supervisor daemon. The two never share a main entry point; they communicate over a Unix socket per ADR 0037.
  • Direct crate access. caliband consumes caliban-supervisor's modules directly without going through a public API surface — appropriate because they ship together.
  • No accidental dispatch. Launching caliban never accidentally invokes caliband's main (or vice versa); they're distinct binaries from cargo and from the user's $PATH.
  • cargo install requires --bin caliband explicitly. The supervisor crate's README documents this; the caliban agents subcommand spawns caliband from the same install prefix as caliban (per ADR 0037).
  • Workspace-root parsimony. The root stays focused on the primary product (caliban); the daemon is appropriately filed under the crate that owns its implementation.

Why this differs from ADR 0005

ADR 0005's "binaries at root" rule was written assuming a single binary. With two, the rule needs nuance:

  • A binary whose sole purpose is to expose a crate's library functionality as an executable belongs with that crate.
  • A binary that integrates many crates into the product surface belongs at the workspace root.

caliban is the latter; caliband is the former. This ADR amends ADR 0005's rule by adding that nuance.

Revisit if

  • A third sibling binary appears (e.g., a caliban-mcp daemon for remote MCP servers). At that point the workspace should consider a binaries/ subdirectory rather than continuing the case-by-case pattern.
  • caliband outgrows its current sole consumer (the caliban agents subcommand) and starts being launched standalone by other tooling — it might then belong at the root for discoverability.

References

  • ADR 0005 (workspace layout — sets the "binaries at root" convention this ADR refines).
  • ADR 0037 (subagent isolation + fleet — introduces caliband).
  • Source: crates/caliban-supervisor/src/bin/caliband.rs.
  • Declaration: crates/caliban-supervisor/Cargo.toml ([[bin]]).

ADR 0043 · arc-swap as the read-mostly shared-state primitive

  • Status: accepted
  • Date: 2026-05-26

Context

Several read-mostly shared-state surfaces in the workspace use arc_swap::ArcSwap rather than tokio::sync::RwLock:

  • caliban-agent-core::permission_mode::SharedPermissionModeArc<ArcSwap<PermissionMode>> for the active permission mode (read on every tool call; written when the user toggles via the TUI overlay or a slash command).
  • caliban-model-router::breaker::CircuitBreakerArcSwap<BreakerState> for the per-provider breaker state (read on every routed request; written on rolling-window state transitions).
  • caliban-settings::SettingsHandleArc<ArcSwap<Settings>> for the live settings snapshot (read by many subsystems; written when SettingsWatcher fires a reload).

The choice was made per-surface during the parity sweep but never documented at the workspace level until this ADR.

Decision

Prefer arc-swap for shared state when all three apply:

  1. Readers outnumber writers by ≥ 10×. The replacement cost on each write is justified only when reads dominate.
  2. Writers can tolerate full Arc replacement. arc-swap swaps a whole Arc; partial mutation requires a load-modify-store pattern (cheap but susceptible to lost updates without external coordination).
  3. Read latency is on the hot path. A tokio::sync::RwLock is already cheap, but arc_swap.load() is measurably cheaper: it's lock-free, allocation-free, and has no contention even with 100s of concurrent readers.

Use tokio::sync::RwLock for surfaces with frequent partial mutation (e.g., long-lived per-key state where rewriting the whole Arc would thrash GC), or where writer fairness matters more than reader throughput.

Use plain std::sync::Mutex for short critical sections that don't need to await across the lock.

Consequences

  • Lock-free reads. Every load() returns an Arc<T> snapshot via a guard with no contention.
  • No priority inversion under load: readers never block writers, writers never block readers.
  • Slightly higher memory churn on writes: each store allocates a new Arc. Acceptable for the listed surfaces because writes are rare (mode toggle, breaker state transition, settings reload).
  • No fairness guarantees between concurrent writers. Acceptable because writers are rare; if two writers race, the later store wins per the swap's release semantics.
  • Snapshot semantics for readers. A reader sees a single consistent value; subsequent reads may observe a different swapped value. Callers that need a stable snapshot across multiple reads should hoist the load() to a local. (No subsystem in the workspace currently relies on inter-read consistency for arc-swap surfaces.)
  • Cognitive load for new contributors unfamiliar with the semantics: load() returns a snapshot, not a live reference. The module-level comments on each ArcSwap field call this out.

Revisit if

  • A surface using arc-swap grows a need for partial mutation that the swap pattern can't model cleanly — switch to tokio::sync::RwLock at that surface only.
  • The arc-swap crate's maintenance status changes materially (it's small and stable, but watch for unmaintained markers).
  • The workspace adds a surface with writer fairness requirements; do not stretch arc-swap to cover it.

References

  • arc-swap crate: https://crates.io/crates/arc-swap
  • Surfaces:
    • crates/caliban-agent-core/src/permission_mode.rs:124-140
    • crates/caliban-model-router/src/breaker.rs:68-79
    • crates/caliban-settings/src/lib.rs:70-83

ADR 0044 · rmcp 1.7 version pin

  • Status: accepted
  • Date: 2026-05-26

Context

caliban-mcp-client depends on rmcp — the Model Context Protocol Rust SDK. The workspace Cargo.toml pins it at 1.7.x:

rmcp = { version = "1.7", features = [...] }

This is a tighter pin than the typical Rust convention of "compatible with the listed version" (^1.7 allows any 1.x.y where x ≥ 7). The choice was made when adopting rmcp and never recorded as an ADR until now.

Decision

Pin rmcp at the 1.7.x minor.

Bumps to a new minor (1.8, 1.9, etc.) are landed in a single dedicated PR after:

  1. Reading the upstream changelog for breaking changes affecting our MCP transport, OAuth, elicitation, or resource surface (ADRs 0017, 0023).
  2. Verifying our integration tests still pass against the bumped version.
  3. Spot-checking the canonical reference MCP servers (a stdio server, an HTTP+OAuth server) end-to-end.

Patch bumps within 1.7.x (1.7.0 → 1.7.1) are auto-resolved by Cargo and do not require a dedicated PR.

Consequences

  • Insulation from breaking changes in MCP transport or server APIs between rmcp minor releases. Our surface (crates/caliban-mcp-client/src/{client,transport,oauth,elicitation,resource}.rs) is large enough that an unexpected upstream minor could mean a multi-day debug session.
  • Manual maintenance cost. Each minor bump requires changelog review + integration test pass + a dedicated PR. Estimate: 1-3 hours per bump.
  • Predictable runtime behavior for users running pinned binaries against established MCP servers. The wire protocol is stable across the 1.x line by upstream convention, but rmcp's API surface has reshaped between minors in the past.
  • Risk: lagging behind upstream means missing protocol-level enhancements (e.g., new transport modalities, new elicitation features) until we explicitly bump. Mitigation: a quarterly changelog check is on the project cadence.
  • Risk: security updates in a future minor (e.g., a fix in OAuth validation) require an immediate bump rather than auto-pulling. Mitigation: subscribe to the rmcp release notes / RustSec advisories.

Revisit if

  • rmcp reaches 2.0 — at which point the pin needs to move regardless, and the changelog review is mandatory.
  • A security advisory affecting our usage of rmcp surfaces — bump immediately to the patched minor, write the dedicated PR retrospectively.
  • The maintenance cost of staying current outweighs the insulation benefit (e.g., if upstream stabilizes such that minors stop reshaping the API).

References

  • rmcp crate: https://crates.io/crates/rmcp
  • ADR 0017 (MCP stdio v1) and ADR 0023 (MCP v2 — transports, OAuth, elicitation, resources) — the surfaces that consume rmcp.
  • Workspace pin: root Cargo.toml (rmcp = { version = "1.7", ... }).

ADR 0045 · Permissions v2 — TOML-primary config + richer rule schema

  • Status: accepted
  • Date: 2026-05-31
  • Supersedes (partial): ADR 0026 (settings layering) — refines write format and per-rule schema.

Context

caliban shipped v1 permissions (ADR 0020), permission modes (ADR 0029), and layered settings (ADR 0026) with JSON as the canonical write format. Operator feedback and a security/UX review surfaced four classes of problems: (1) the TUI Ask modal's "always allow / always deny" never persisted, breaking the ADR 0020 promise; (2) the JSON permissions.{allow,ask,deny} form lost source order and comments; (3) JSON is the wrong primary format for a Rust project where operators expect TOML and want hand-edited config that ports between machines; (4) there was no full management surface (CLI or in-TUI editor) for rules.

Decision

  1. Restore TOML as caliban's canonical config write format at every scope; JSON is accepted on read as a legacy/import path (with a WARN). All caliban-owned writes — modal, /permissions editor, caliban perms CLI — emit TOML.
  2. Replace the three-bucket permissions.{allow,ask,deny} form with an ordered [[permissions.rules]] array of objects carrying pattern, action, optional comment, optional reason (deny-only, seen by the model), and reserved expires_at. First match wins. The three-bucket form still loads (legacy compat) but normalizes into the ordered array on load.
  3. Extend pattern grammar: globstar **, path normalization for file-edit tools, Bash:~glob anywhere-match, dotted-key MCP arg accessors.
  4. Modal writeback (P1): y / n opens a sub-prompt with narrow-default suggestions, a scope picker, and an optional comment/reason. Atomic flock-protected TOML append.
  5. Active management surface: /permissions overlay grows full editor capabilities; caliban perms CLI provides headless list / test / explain / add / remove / import / export / audit / lint.
  6. Hardening: permissions.enforce lockdown knob, append-only JSONL decision log under $XDG_STATE_HOME with size-based rotation, always-visible bypass-latch chip with ctrl+shift+b drop keybind.

Consequences

  • Positive: matches Rust ecosystem norms; comments and source-order survive; the modal's promise is finally honored; operators have a complete management story (TUI + CLI); enforce + audit log close long-standing security gaps.
  • Negative: doubles the schema surface during the compat window (legacy JSON + TOML buckets + v2 ordered rules coexist on read); the matcher gets a denser grammar (more to document).
  • Compat window: legacy reads continue for two minor releases; writes deprecate immediately. After three minor releases only the canonical TOML schema loads.

Runtime application semantics

Rules added through the Ask modal's "Always allow/reject" are applied to the running session immediately: the gate and the TUI share one RuntimeRuleStore, so the just-added rule gates the next matching tool call without re-prompting — regardless of which scope it is also persisted to on disk.

Rule removals and out-of-band file edits are intentionally not hot-reloaded into a running session. Deleting a file-scoped rule via the /permissions overlay or caliban perms remove updates the on-disk file but does not retroactively tighten the live gate; the change takes effect on the next session start. Deleting a session (runtime) rule with [d] in the overlay does take effect live, because it mutates the in-memory store directly. This asymmetry keeps the gate cheap — no per-call disk re-read or file watcher — while making the common "allow this now" gesture feel instant.

Revisit if

  • Operators report concrete cases where the ~glob or dotted-key grammars are insufficient — next step would be a richer expression language or a classifier-graded gate (already deferred via ADR 0029 auto-mode).
  • The bypass-latch chip + drop keybind UX proves footgunny — could promote the drop to a confirmation dialog.

ADR 0046 · Two-stage tool surface — lazy MCP schema loading + ToolSearch

Context

ToolRegistry::to_caliban_tools() is invoked once per turn at crates/caliban-agent-core/src/stream/mod.rs:497-523, cloning every registered tool's name + description + JSON Schema into the wire payload. Built-ins are bounded (~14 entries) but MCP tools scale linearly with configured servers — three average MCP servers can add ~20K tokens/turn of dormant tool advertising before history is considered. The problem is structural and will worsen as the MCP/plugin ecosystem grows, which calls for a design doc + ADR + multi-PR sequence; this ADR is that decision.

Decision

  1. Introduce a single new built-in ToolSearch that returns matched MCP tools with their full JSON Schemas and activates them for the rest of the session in a single round-trip. No separate Activate tool; no two-step UX.

  2. Store activation state in a sidecar McpActivationSet held by Agent as Arc<ArcSwap<McpActivationSet>>, following the read-mostly pattern of ADR 0043. ToolRegistry is unchanged; an added to_caliban_tools_filtered(&WireFilter) returns the per-turn wire subset.

  3. Filter MCP tools, never built-ins. The v1 scope is MCP-only laziness; built-ins (Read, Grep, Glob, Edit, Bash, Write, WebFetch, WebSearch, TodoWrite, Skill, AgentTool, EnterPlanMode/ExitPlanMode, memory tools) stay always-present. Plugin-tool laziness is moot today (plugins contribute skill roots, not tools).

  4. Sticky per session, LRU evict at cap. Activations persist for the rest of the session; tools.max_active_schemas (default 24) is a soft cap. New activations beyond the cap evict the least recently used entry, reported in the ToolSearch response text so the model sees what dropped.

  5. Sub-agent inheritance is opt-out via frontmatter. AgentTool frontmatter gains inherit_active_mcp: Option<bool> defaulting to true. When true, install_sub_agent snapshots the parent's McpActivationSet; when false the child starts fresh. The existing tools: [...] allowlist still filters.

  6. Default off; opt-in via tools.lazy_mcp = true. Conservative v1; flip to default-on in v1.1 after validation. Per-server override via mcp.toml ([server.X] lazy = false) pins always-hot servers (e.g. a memory/notes server) to eager mode.

  7. Belt-and-suspenders discovery. When lazy_mcp = true and at least one MCP tool is gated, splice a fixed paragraph into the system prompt explaining ToolSearch plus the deferred count; the ToolSearch tool description itself also names the affordance.

  8. /context surfaces the active set as MCP active: N/cap (a, b, c). /usage is intentionally not touched in v1 (no honest counterfactual reporting yet).

Consequences

  • Positive: removes a linear-in-MCP-cardinality token tax from every turn; matches the function-calling pattern many models are trained on; structural readiness for plugin-tool laziness later; no protocol change for the eager path (default behavior is byte-identical).
  • Positive: single read-mostly ArcSwap for activation state fits the existing concurrency model and makes sub-agent snapshot trivial.
  • Negative: introduces a model-facing contract (search-then-call) that requires the model to read system-prompt guidance; some weaker models may not pick up the pattern reliably (mitigation: it is opt-in in v1, and the "model issues tool_use without searching first" path still works via registry dispatch + auto-activation).
  • Negative: tool-list cache prefix is invalidated on each activation; a future split-cache optimisation is sketched in the spec but out of scope for v1.
  • Compat window: default false for v1; v1.1 flips default to true (parity matrix rows F.ToolSearch / F.WaitForMcpServers move 🔴 → 🟡 in v1, 🟡 → ✅ in v1.1).

Revisit if

  • Activation set's read-mostly assumption breaks down (e.g. the model starts calling ToolSearch every turn) — would warrant a finer-grained cache strategy.
  • Built-in tool palette grows substantially (e.g. a wave of new builtins) and the cardinality problem returns for built-ins — would motivate a separate built-in laziness spec.
  • A model is observed to reliably ignore the deferred-block guidance — would motivate a stronger affordance (e.g. forcing an inert ToolSearch tool_use as the first turn under lazy mode).
  • Activation persistence across session restart becomes a hot request — would warrant the v1.1 follow-up sketched in the spec's "Open questions" section.

ADR 0047 · Interactive background sub-agents (idle / await-input)

  • Status: accepted
  • Date: 2026-06-10
  • Spec: docs/superpowers/specs/2026-06-10-interactive-background-subagents-design.md
  • Amends: ADR 0037 (sub-agent worktree isolation + background fleet) — revises one non-goal clause; see "Decision".
  • Builds on: ADR 0009 (agent-core stream-as-primitive), ADR 0024 (hook taxonomy), ADR 0037 (background fleet + per-agent socket).
  • Author: john.ford2002@gmail.com
  • Issue: caliban-ai/caliban#81

Context

ADR 0037 shipped the background fleet: bg = true sub-agents owned by the caliband daemon, each exposing a per-agent socket carrying its TurnEvent stream. Issues #71 / #78 / #79 / #75 / #76 / #77 implemented that runtime — workers launch, stream their transcript live over caliban agents attach, clean up on exit, and run behind a permission gate.

ADR 0037 deliberately scoped inbound interaction out. Its non-goals say:

Re-attaching a stopped sub-agent into the parent's context. Once detached, a background sub-agent runs to completion (or is killed). The parent reads its final summary via caliban agents attach or the /agents overlay.

and its design spec describes the per-agent socket as carrying "TurnEvents and inbound user messages" — a capability that was documented but never built. The result today: an attached operator can watch a background agent but cannot talk to it. When the agent finishes its prompt, it ends; there is no way to say "good, now also do X" without respawn (which loses all context).

Two facts make this worth revisiting now:

  1. It is a small generalization, not a rewrite. The agent loop already has TurnDecision::ContinueWith(Vec<Message>) (ADR 0024 hook taxonomy): an after_turn hook can inject messages and force another turn. That is exactly "resume a finished turn with new input" — capped at MAX_FORCED_CONTINUATIONS = 3 only to stop hook death-spirals.
  2. The fleet UX expects it. AgentStatus::Idle ("awaiting input; no compute pending") is defined in the proto and rendered by agents list but is never set, because nothing awaits input.

This ADR records the architectural commitments for closing that gap. Mechanics live in the companion design spec.

Decision

Revise ADR 0037's "runs to completion" non-goal

ADR 0037's non-goal is narrowed, not deleted:

  • Still a non-goal: re-attaching a sub-agent into the parent agent's automated context. A bg sub-agent never feeds results back into the parent's running loop; the parent reads a final summary out-of-band. This ADR does not change that.
  • Now permitted: an operator (a human at caliban agents attach) may send user messages to a running background sub-agent, which resumes from that input rather than ending. This is interactive operator I/O over the per-agent socket — categorically different from automated parent-context re-attachment.

The distinction matters: the danger ADR 0037 guarded against was automated fan-in (a sub-agent silently resuming the parent). A human typing into an attached session carries no such hazard and is the natural way to steer a long-running background task.

Interactivity is a first-class agent-core run mode, not a hook hack

We add an optional InputProvider to a run (via RunSettings), rather than overloading after_turn + ContinueWith. When the model reaches a natural end-of-run boundary (it stopped and no tool call is pending), the loop — if an InputProvider is configured — awaits the provider for the next user message:

  • Some(messages) → inject into history, mark Idle → Running, take another turn.
  • None → the provider signalled end-of-input; the run ends normally (StopCondition::EndOfTurn, status Done).

We choose a pull-based InputProvider over the existing ContinueWith hook path because:

  • It is not death-spiral-prone, so it is correctly uncapped (a human drives it; MAX_FORCED_CONTINUATIONS stays as the anti-spiral cap for hook-forced continuations only).
  • It models "await input" honestly — the loop blocks on external I/O at a well-defined boundary, which after_turn (fires every turn) does not.
  • It composes with hooksbefore_turn/after_turn/permission hooks still run on the resumed turns unchanged.

Foreground and one-shot runs pass no InputProvider and are byte-for-byte unchanged (the boundary check is if let Some(provider)).

The per-agent socket becomes bidirectional

ADR 0037's per-agent socket carried worker→client TurnEvents only (#79). It becomes bidirectional: the worker continues writing TurnEvent NDJSON outbound, and now reads inbound user-message frames (newline-delimited JSON, a small tagged frame type) from attached clients. The worker's InputProvider is fed by these frames. caliban agents attach gains a send path (stdin → user-message frames). Read-only viewers (e.g. a future /agents overlay tail) simply never send.

Idle is a real, reported lifecycle state

AgentStatus::Idle is wired: the worker reports Running → Idle when it begins awaiting input and Idle → Running when it resumes. Because the daemon — not the worker — owns the registry, this requires a worker → daemon status-report channel (the worker currently only talks to attach clients). The design spec picks the mechanism; the commitment here is that Idle is observable in agents list and the /agents overlay.

Bounded idle: an idle agent must not live forever

An agent awaiting input with no attached clients is a resource leak in waiting. The run ends (Done) when any of:

  • the InputProvider returns None (an attached operator sent an explicit end / detached with end-intent),
  • a configurable idle timeout elapses with no inbound message and no attached client, or
  • caliban agents kill (unchanged).

Default idle timeout is conservative (minutes, configurable per SupervisorConfig); the spec sets the exact default.

Consequences

  • Positive. Closes the last documented gap in the ADR 0037 per-agent socket ("inbound user messages"). Turns background sub-agents from fire-and-forget into steerable long-running workers — the natural UX for "kick off a background refactor, watch it, nudge it." Reuses the audited permission gate (#75) on resumed turns. Wires the long-dormant Idle state. The InputProvider abstraction is reusable beyond background agents (e.g. a future scripted multi-turn driver).
  • Negative. A new first-class run mode in agent-core (small, but it touches the core loop's end-of-run boundary — the highest-blast-radius file in the codebase). A worker→daemon status channel that did not exist. A bidirectional socket protocol (frame schema, multi-client inbound multiplexing). The idle-timeout adds a timer to the worker. None of these affect foreground/headless runs.
  • Revisit if: multi-client inbound proves confusing (two operators typing at one agent) — may need a single-writer lease. If InputProvider wants richer turns than user text (images, tool results), generalize the inbound frame. If operators want to fork an idle agent's context rather than continue it, that is a separate "branch" primitive, out of scope.

Decomposition (see spec for detail)

This ADR is intentionally larger than one PR. The spec breaks it into independently-shippable tickets:

  1. agent-core InputProvider run mode (+ tests; foreground unaffected).
  2. Bidirectional per-agent socket frame protocol + worker InputProvider backed by the socket.
  3. caliban agents attach send path (stdin → frames; end/detach semantics).
  4. Worker → daemon status reporting + AgentStatus::Idle wiring.
  5. Idle timeout + bounded-lifetime cleanup.