Native brain — operator runbook¶
Istota has three model-invocation backends behind one protocol:
claude_code(default) — wraps the headlessclaude -pCLI subprocess. Battle-tested; delegates the agentic loop, tool use, and context management to Claude Code.native— istota's own in-process agent loop against an OpenAI-compatible provider. Gives istota direct control over the loop, tool execution, context compaction, and model selection.tmux_claude— drives the interactiveclaudeTUI in a detached tmux session (keeps traffic on subscription billing), with automatic fallback toclaude_codeand a circuit breaker. Configured under[brain.tmux]; seeconfig.example.tomlfor the full block.
All coexist permanently and are switchable per instance or per task. Switching does not touch executor orchestration (memory, skills, sandbox, deferred writes) — only which Brain implementation runs.
This runbook covers the native backend.
Enabling the native brain¶
Instance-wide:
[brain]
kind = "native"
[brain.native]
provider = "openai_compat" # only provider currently
model = "claude-sonnet-4-6" # explicit id — openai_compat has no aliasing
effort = "" # default reasoning effort (see below)
base_url = "https://api.anthropic.com/v1" # any OpenAI-compatible endpoint
max_turns = 100 # hard cap on assistant turns per task
max_tokens = 16384 # per-completion output cap
# prompt_caching # omit to derive from base_url (see below)
The API key never goes in the TOML file. Set it via the env override:
(loaded from the systemd EnvironmentFile=, direnv, or .env).
Model id format¶
openai_compatneeds an explicit model id (e.g.claude-sonnet-4-6). It does not understand role aliases (smart) or Claude-CLI short names (opus).- The
claude_codebrain (default) keeps Claude Code's aliasing —opusresolves to the latest Opus. The native brain does not; map role names with[models.roles]if you want them.
Ansible deployment¶
The role renders the [brain] block from inventory variables. The [brain.native] and [brain.source_type_overrides] tables are only written when istota_brain_kind is native or istota_brain_source_type_overrides is non-empty, so existing deployments stay byte-identical until you opt in. After templating, files/validate_config.py parses the rendered config and gates the scheduler restart, so a malformed brain block fails the play instead of the running daemon.
Instance-wide native brain:
istota_brain_kind: "native"
istota_brain_native_provider: "openai_compat"
istota_brain_native_model: "claude-sonnet-4-6"
istota_brain_native_base_url: "https://api.anthropic.com/v1"
istota_brain_native_effort: "" # default reasoning effort (thinking models only)
# istota_brain_native_prompt_caching # "" (default) derives from base_url; set true/false to force
istota_brain_native_api_key: "{{ vault_native_api_key }}" # → ISTOTA_BRAIN_NATIVE_API_KEY
Gradual rollout (keep the default brain, move background work to native):
istota_brain_kind: "claude_code"
istota_brain_native_model: "claude-sonnet-4-6"
istota_brain_native_api_key: "{{ vault_native_api_key }}"
istota_brain_source_type_overrides:
scheduled: native
heartbeat: native
The full variable set (all defaulted to the code defaults) is documented in deploy/ansible/defaults/main.yml: istota_brain_native_{provider,model,effort,base_url,extra_headers,context_window,max_turns,max_tokens,prompt_caching,api_key} and istota_brain_source_type_overrides. istota_brain_native_prompt_caching defaults to "" (derive from base_url); set it to true/false only to force.
Key handling:
istota_brain_native_api_keyis never written toconfig.toml. Withistota_use_environment_file: true(the default) it's rendered into the systemdEnvironmentFileasISTOTA_BRAIN_NATIVE_API_KEY; vault it.- Per-user keys go through the existing
istota_user_secretsmechanism (thenative_brainservice is in the connected-service schema, flaggedcli_onlyso it's operator-provisioned only — not exposed in the web UI), and overlay the instance key for that user's tasks:
istota_user_secrets:
alice:
- { service: native_brain, key: api_key, value: "{{ vault_alice_native_key }}" }
istota_brain_native_extra_headersis rendered as a[brain.native.extra_headers]sub-table (a TOML inline table would be mis-emitted by the JSON filter), so header names with dots or dashes (anthropic-beta) are safe.
Gradual rollout: per-source-type routing¶
Rather than flipping the whole instance at once, route specific task types to the native brain while everything else stays on claude_code. This is the recommended rollout path: move low-risk background work first, keep interactive talk/email on the proven backend, watch for regressions, then widen.
[brain]
kind = "claude_code" # default for everything not listed below
[brain.source_type_overrides]
scheduled = "native" # cron jobs
heartbeat = "native" # health checks
source_type values match the task's origin: talk, email, briefing, scheduled, heartbeat, subtask, cli, istota_file. A routing typo (unknown brain kind) is logged and ignored — the task falls back to the instance default rather than failing. Each routed task logs one INFO line (brain routing: task … -> kind=native).
Local development¶
Bubblewrap is Linux-only, so on a Mac dev box run with the sandbox off. Keep a gitignored config/config.dev.toml (copy config/config.dev.toml.example):
[brain]
kind = "native"
[brain.native]
provider = "openai_compat"
model = "claude-sonnet-4-6"
base_url = "https://api.anthropic.com/v1"
[security]
sandbox_enabled = false # bwrap is Linux-only
skill_proxy_enabled = false # simplifies the inner loop
[users.dev]
display_name = "Dev"
Sandbox correctness cannot be validated on the Mac. "Works locally" means "logic is correct," not "isolation is correct" — check isolation on a Linux box or in the Docker image.
Dev tiers¶
Standalone loop runner (scripts/native_repl.py) — runs one prompt through a NativeBrain with no executor/scheduler/Talk/DB. Tools operate in a throwaway temp dir. Prints the streamed events, the BrainResult, and TaskUsage (so cost is visible).
# Offline, deterministic — a scripted mock provider drives the loop.
uv run python scripts/native_repl.py --provider mock \
--script tests/native/fixtures/two_tool_turn.json "write and read a file"
# Replay a recorded SSE session through the real parser (no credits).
uv run python scripts/native_repl.py --provider replay \
--fixture tests/native/fixtures/text_completion.jsonl --tools "" "summarize this repo"
# Live, against whatever the dev config points at (needs a key).
uv run python scripts/native_repl.py -c config/config.dev.toml --provider live "..."
Recorded-SSE replay — ReplayProvider feeds committed JSONL SSE fixtures through the real provider parser (CI default, offline). RecordingProvider regenerates fixtures from the live API (ISTOTA_NATIVE_RECORD=1 + a real key), run rarely.
Full CLI task path — point the existing istota task CLI at the dev config:
uv run istota init -c config/config.dev.toml
uv run istota task "read README and summarize it" -u dev -x -c config/config.dev.toml
Zero-cost live loop: point [brain.native] at a local Ollama model (base_url = "http://localhost:11434/v1"). Quality is lower — small models loop and mis-call tools, which is itself useful for exercising the loop detector and JSON repair — but it validates the whole stack offline.
Shadow compare¶
Before flipping a task type to native, run the same prompt through both brains and diff the output:
uv run python scripts/brain_shadow.py -c config/config.dev.toml \
"read README and summarize it in one sentence"
It diffs result text (similarity + unified diff), tool-call sequence, and native TaskUsage. Exact parity is not expected — the brains manage context differently and expose different tool schemas — but outcomes should be equivalent. Large text divergence or wildly different tool sequences are the signal to investigate.
Operational notes¶
- Cost telemetry. The native brain computes per-task token usage and cost (priced from the bundled model catalog; pinned Anthropic ids ship at price 0.0 until set) and writes it to
task_logs(ausage {...}info line) plus annative_usagelog line.claude_codeleaves usage opaque — the CLI doesn't surface per-call usage. - Per-user API keys. Beyond the instance-wide
[brain.native] api_key/ISTOTA_BRAIN_NATIVE_API_KEY, each user can have their own provider key in the encrypted secrets table:istota secret ensure -u <user> -s native_brain -k api_key -v <key>. This is operator-provisioned only (CLI/Ansible) — it's deliberately not in the web UI, since it overrides only the key and not the provider/model/base_url, so a self-serve knob would imply more than it delivers. The per-user key overlays the instance key for that user's tasks. - Reasoning effort.
[brain.native] effort(low/medium/high/xhigh/max, default empty) sets a default reasoning budget; per-task overrides (e.g.!model opus-high,[models.roles]) win. It is sent as the OpenAI-compatiblereasoning_effortfield only when the target model is thinking-capable (supports_thinkingin the bundled catalog) — for a non-reasoning endpoint it is dropped silently so the request never 400s.xhighandmaxfold tohighon the wire (the compat field exposes no finer knob); the original tier still tracks on the task row. Extended-thinking output is parsed but excluded from the visible result. - Prompt caching.
[brain.native] prompt_cachingaddscache_controlbreakpoints covering the tool definitions, the system message, the first user message, and a rolling breakpoint on the latest message each turn (up to Anthropic's 4-breakpoint cap), which is what produces cross-turn cache hits. The default is derived frombase_url: on forapi.anthropic.com, off for any other endpoint. Set it explicitly to force either way — a plain-OpenAI, LM Studio, Ollama, or vLLM endpoint that doesn't understand the extension needsprompt_caching = false. A per-task cache hit-rate line is logged at task end (native cache hit_rate=… read=… input=…). - Context-overflow recovery. If a turn exceeds the context window mid-task, the native brain force-compacts the accumulated transcript and continues from the summary instead of failing — up to two recovery attempts, sharing the task's wall-clock deadline. The proactive compaction hook (
prepare_next_turn) is the first line of defense; this is the reactive safety net beneath it. - Image tool results. A tool result carrying image content renders as a follow-up
role:"user"block on vision-capable models (supports_vision); on a no-vision model the image is dropped with a text note so the request still validates. - Model ids.
openai_compatneeds explicit ids and does not translate Anthropic aliases —opusis sent verbatim, not turned intoclaude-opus-4-8(that mapping is theclaude_codebrain's, not the native brain's). Map role names per deployment with[models.roles]if you wantfast/general/smartunder native. - Cancellation /
!stop. Works on both brains. The native brain bridges the scheduler's cancel poll into anasyncio.Eventthreaded through the loop, tools, and retry backoff. A failing cancel poll (e.g. transient SQLite lock) is tolerated rather than silently disabling!stop. - Task timeout. The native loop runs under a wall-clock deadline of
scheduler.task_timeout_minutes(istota_scheduler_task_timeout_minutes, default 30). On expiry it signals abort (killing any in-flight bash subprocess at the next poll), waits a short grace, then hard-cancels, and returnsstop_reason="timeout". This matchesclaude_codeand prevents a runaway loop from outliving the scheduler's stuck-task reclaim (which would otherwise double-execute the task).max_turnsis a second, coarser backstop. - Context management. The native brain owns compaction (runs in
prepare_next_turn, file-operation aware across cycles).claude_codedelegates it to Claude Code. The two are independent. - Sandboxing.
claude_coderuns the whole subprocess inside bwrap. The native brain runs the loop in-process and sandboxes each tool execution per-call (the loop itself never runs user-controlled code). Validate the per-tool sandbox on Linux, not on the Mac.
Rollback¶
Set [brain] kind = "claude_code" (or remove the source_type_overrides entry) and restart the scheduler. ClaudeCodeBrain is never removed — rollback is a one-line config change.