Task lifecycle¶

A task moves through several stages from creation to completion. This document traces the full path, including which process owns each stage and where data is persisted.

Status flow¶

pending → locked → running → completed
                           → failed
                           → pending_confirmation → completed (on confirm)
                                                  → cancelled (on deny/timeout)

Creation¶

Tasks enter the queue from multiple sources:

Source	Entry point	`source_type`
Talk message	Talk poller (`transport/talk/inbound.py`)	`talk`
Web chat	Web POST → `ingest_message` (`web_app.py`)	`web`
Email	Email poller (`transport/email/inbound.py`)	`email`
TASKS.md file	File poller (`tasks_file_poller.py`)	`istota_file`
CLI	`istota task` command (`cli.py`)	`cli`
REPL	`istota repl` (inline, `run_task_inline`)	`repl`
Scheduled job	`check_scheduled_jobs()` in scheduler	`scheduled`
Briefing	`check_briefings()` in scheduler	`briefing`
Subtask	Deferred JSON from a parent task	`subtask`

All sources call db.create_task(), which inserts a row with status='pending'.

Claiming and locking¶

claim_task() runs inside a worker thread. It uses an atomic UPDATE...RETURNING to grab the next pending task, setting status='locked' with the worker ID and timestamp. Before claiming, it runs stale lock cleanup:

Fail tasks locked > 30 min that are too old to retry
Release recent stale locks back to pending
Same for stuck running tasks — "stuck" is decided by worker liveness, not a flat runtime (ISSUE-112): a running task is reclaimable once its last_heartbeat has been silent longer than worker_stuck_minutes (default 10), or, if it never heartbeated, once started_at exceeds task_timeout_minutes + grace. The running worker pings last_heartbeat every worker_heartbeat_seconds, so a slow-but-alive worker (e.g. the in-process native brain) is never reclaimed

Tasks are ordered by priority DESC, created_at ASC. Workers filter by user_id and queue type.

Execution¶

After claiming, the worker immediately updates status to running and closes the DB connection. Everything from here until result processing happens outside any DB transaction to avoid long locks.

Command tasks¶

If the task has a command field (shell scheduled jobs), it runs via _execute_command_task() — through _run_capture (a Popen with start_new_session=True, so a timeout SIGKILLs the whole process group rather than blocking on an orphaned grandchild). Its env is build_stripped_env() plus propagated ISTOTA_* vars, ISTOTA_EXPERIMENTAL_FEATURES, and manifest-derived credential / connection vars resolved by build_skill_env + dispatch_setup_env_hooks. No skill selection, no Claude, no prompt assembly.

Prompt tasks¶

For all other tasks, execute_task() handles the full pipeline:

Skill selection — single axis: select_skills() runs deterministic matching (always_include / source_types / file_types / sticky / companions, minus exclude_skills) to produce the eager set (full body in the prompt). Keyword and resource matching are no longer selectors, and there is no progressive-disclosure partition. Every other eligible skill (eligible_skill_names) becomes a one-line menu entry the model pulls in on demand
Persist selected skills to DB via save_task_selected_skills()
Load skill docs and resolve env vars
Context loading (Talk message cache or email thread)
Memory loading (USER.md, CHANNEL.md, dated memories, recalled memories)
Prompt assembly (see executor docs for section order)
Brain execution — the executor builds a BrainRequest and calls brain.execute(req). The default ClaudeCodeBrain spawns claude -p - --output-format stream-json and parses the stream. NativeBrain runs an in-process agent loop over HTTP against any OpenAI-compatible model; TmuxClaudeBrain drives the interactive Claude TUI in a detached tmux session (not HTTP). See brain.
Result composition (still in executor) — _compose_full_result(text, trace) handles context-management boundaries and terse-result recovery; both brains produce the same (result_text, execution_trace) shape.

The executor returns (success, result_text, actions_taken_json, execution_trace_json).

Progress updates¶

Progress flows through task-event streaming, not per-consumer callbacks. The executor adapts the brain's StreamEvents into typed TaskEvents and writes them to the task_events log via an EventWriter. process_one_task subscribes three in-process consumers to that log:

TalkEventSubscriber: edits the ack message in place with rate-limited progress
LogChannelSubscriber: accumulating edit of the operator's log-channel message
PushNotificationSubscriber: ntfy on long-running tasks

The web SSE endpoint reads the same task_events table directly (the table is the bus, no IPC). The old _make_talk_progress_callback / _make_log_channel_callback / _composite_callback chain is gone.

Result processing¶

Back in the scheduler, process_one_task() handles the result inside a DB transaction:

Success path¶

API error guard: detect API errors masquerading as success (exit 0 with error text)
Malformed output guard: detect leaked tool-call XML — reclassify as failure
Confirmation check: regex match for confirmation requests → pending_confirmation
Update to completed: stores result, actions_taken, execution_trace
Memory search indexing: index conversation under user and channel namespaces
Delivery routing: transport.routing.resolve_delivery_plan turns output_target into an ordered, channel-resolved destination list (Talk, email, ntfy, TASKS.md write-back, or stream surfaces web/REPL). Stream destinations need no push — the task_events log is the delivery

Failure path¶

Check if task was cancelled by user (!stop command)
Retry with exponential backoff (1, 4, 16 min) if attempts remain
Mark permanently failed after max_attempts (default 3)
Track scheduled job consecutive failures; auto-disable after threshold

Post-completion¶

After the DB transaction closes:

Deferred operations: process JSON files from the sandbox temp dir (subtasks, transaction tracking, sent emails, KV ops, KG ops, health ops, user alerts, email output)
Briefing digest: save for next-run deduplication
Talk progress finalize: edit ack message with final summary
Log channel finalize: edit/post completion message with skills and tool summary
Result delivery: fan out to every push destination in the resolved plan (Talk, email, ntfy, TASKS.md); stream surfaces (web, REPL) deliver via the task_events SSE log

Log channel messages¶

When a user has log_channel configured, each task gets a log channel entry showing:

**[#12345]** ✅ Done (3 actions) - #channel-name
Skills: calendar, email, files, memory, sensitive_actions
📅 Listed calendar events
📧 Sent email reply
📄 Read USER.md

The skills line is populated by reading selected_skills from the DB after task completion. Controlled by log_channel_show_skills (default: true) in the [scheduler] config section.

Data flow gotchas¶

Column visibility in get_task()¶

The get_task() function uses an explicit column list in its SELECT, not SELECT *. When adding new columns to the tasks table, you must update three places:

The ALTER TABLE migration in _run_migrations()
The _row_to_task() mapping (with in row.keys() fallback)
The SELECT column list in get_task() — easy to forget, and _row_to_task silently falls back to None

Skills are saved before execution, read after¶

save_task_selected_skills() runs early in execute_task(), before the Claude subprocess launches. The log channel finalize reads them back from the DB after the task completes. Any code path that clears or overwrites the row between those points would lose the skills data.

DB connections are short-lived¶

The scheduler opens and closes DB connections for each phase (claim, execute, result processing, finalize). This is intentional — long-held connections would block other workers via SQLite's write lock. Each with db.get_db() block is a separate transaction.

Command tasks skip the executor¶

Shell command tasks (task.command is set) bypass execute_task() entirely. They have no skill selection, no prompt, no streaming. Their log channel entries will never show skills.