ADR 0018 · Memory tier model (CLAUDE.md ingestion + auto-memory)
- Status: accepted
- Date: 2026-05-23
Context
caliban has no persistent memory across sessions. The default system
prompt is rebuilt from cwd + tool list each invocation (ADR 0014), so
operator preferences, project conventions, and learned facts about
the user have to be re-supplied by hand every time. Claude Code
solves this with a CLAUDE.md mechanism plus an auto-memory tier
the agent can write to via its existing file tools. The user's own
~/.claude/CLAUDE.md already exercises this pattern; that mental
model is the target.
Decision
caliban adopts a three-tier memory model, all of which live on disk as plain Markdown and are read at session start. A fourth MCP-mediated tier slots in later (forward link only; not in this ADR).
Tier 1 — Global
- Path:
~/.config/caliban/CLAUDE.md(XDG$XDG_CONFIG_HOMEhonored). - Owner: the operator. caliban never writes here.
- Contents: cross-project preferences (tool choice, style, persona).
- Read once at startup, optional (missing file is fine).
Tier 2 — Project
- Path:
<workspace_root>/CLAUDE.mdwhereworkspace_rootisWorkspaceRoot::root()(ADR 0010). - Owner: the project / repo. caliban never writes here (operators commit it like any other file).
- Contents: repo-specific conventions, build commands, taboos.
- Read once at startup, optional.
Tier 3 — Auto-memory
- Directory:
~/.local/share/caliban/projects/<sanitized-cwd>/memory/(XDG$XDG_DATA_HOMEhonored). Sanitization replaces/with-and drops the leading dash, so/Users/jf/dev/calibanbecomesUsers-jf-dev-caliban. - Files: one
MEMORY.md(index, ≤ 200 lines) plus arbitrary<slug>.mdtopic pages. - Owner: the agent. Writes go through the existing
Write/Edittools — no special memory tool, no separate trust path. - Only
MEMORY.mdis loaded eagerly. Topic pages are lazily fetched by the agent viaReadwhen the index points it at one.
Composition
All three tiers are concatenated into the system prompt above the auto-generated default (cwd + tool list + conventions, per ADR 0014). Order: global → project → auto-memory index. Each tier is wrapped in explicit delimiters so the model can tell them apart:
<global-claude-md path="…/CLAUDE.md">…</global-claude-md>
<project-claude-md path="…/CLAUDE.md">…</project-claude-md>
<auto-memory-index path="…/MEMORY.md">…</auto-memory-index>
<default system prompt body from system_prompt::build_default …>
Missing tiers are simply omitted (no empty tag block).
Token budget
The combined memory prefix is capped at 8 000 tokens (estimated as
chars / 4 — provider-agnostic and cheap). If the combined size
exceeds the cap, auto-memory is truncated first (with a
[truncated: N bytes] notice appended to its block), then project,
then global. Hitting the global cap is treated as operator error
(loud tracing::warn! plus the truncation marker in the prompt).
Retrieval
None in v1. Memory IS the system prompt prefix. Semantic search
over memory (RAG) is a v2 concern and would slot in as a new tool
(MemorySearch), not as a change to how memory is loaded.
Forward links
- MCP memory tier. Once MCP support ships, an MCP server like the user's SilverBullet integration plugs in as Tier 4: not eagerly loaded, accessed on demand via MCP tool calls. The precondition-check pattern from the user's own CLAUDE.md ("skip if MCP is absent") applies.
/memoryslash command. Shows active tiers + paths + sizes; offers$EDITORopen for the global and project files. Detailed in the spec.
Consequences
- Positive. Matches the user's existing mental model exactly, zero learning curve. Agent maintains its own knowledge using the same Read/Write/Edit it already has — no special memory tool to audit, sandbox, or rate-limit. MCP tier slots in cleanly without reshaping the loader.
- Negative. 8K tokens is real cost on every turn (Anthropic prompt caching recoups most of it). Agent can clutter auto-memory if write conventions aren't well-specified (the spec pins them down). No drift detection between project CLAUDE.md and what the agent "remembers" — by design; the project file wins by splice order, but contradicting auto-memory will sit side by side.
- Revisit if: the 8K cap starts triggering routinely (raise it,
or add summarization); auto-memory becomes a write-only graveyard
(add a v2
MemorySearchtool and stop loading the full index); per-project agent memory grows past what's reasonable to grep (move to SQLite, but keep markdown export).
Crate
New crate caliban-memory owns tier discovery, sanitization, file IO,
splicing, and budget enforcement. caliban-agent-core does not
take a dep on it — the binary (caliban/src/main.rs) calls the
memory crate at startup and passes the assembled string to
system_prompt::resolve as a prefix.
Revised 2026-05-26
Bumped the combined-prefix default from 8,000 to 32,000 tokens. The 8,000-token default was conservative against 2024 context windows and was increasingly punishing in 2026 (1M-token Sonnet, 200K standard on most providers). Truncation-first behavior was at risk of dropping the auto-memory index — exactly the tier that grows.
Added per-scope token caps via three optional [memory] settings keys
(all integer, default unset):
cap_tokens_auto— caps the auto-memory tier independently.cap_tokens_claude_md— caps the combined CLAUDE.md tier (global + project). When binding, truncates project first, then global.cap_tokens_combined— overrides the combined ceiling (max_tokens).
When the sum of both per-scope caps would exceed cap_tokens_combined,
each is scaled down proportionally rather than silently dropping a
tier. Settings.json values override the corresponding env vars
(CALIBAN_MEMORY_BUDGET_TOKENS, CALIBAN_MEMORY_CAP_TOKENS_AUTO,
CALIBAN_MEMORY_CAP_TOKENS_CLAUDE_MD) when both are present.
Truncation order within a tier is unchanged from the original Decision.