Task lifecycle¶
A task moves through several stages from creation to completion. This document traces the full path, including which process owns each stage and where data is persisted.
Status flow¶
pending → locked → running → completed
→ failed
→ pending_confirmation → completed (on confirm)
→ cancelled (on deny/timeout)
Creation¶
Tasks enter the queue from multiple sources:
| Source | Entry point | source_type |
|---|---|---|
| Talk message | Talk poller (transport/talk/inbound.py) |
talk |
| Web chat | Web POST → ingest_message (web_app.py) |
web |
Email poller (transport/email/inbound.py) |
email |
|
| TASKS.md file | File poller (tasks_file_poller.py) |
istota_file |
| CLI | istota task command (cli.py) |
cli |
| REPL | istota repl (inline, run_task_inline) |
repl |
| Scheduled job | check_scheduled_jobs() in scheduler |
scheduled |
| Briefing | check_briefings() in scheduler |
briefing |
| Subtask | Deferred JSON from a parent task | subtask |
All sources call db.create_task(), which inserts a row with status='pending'.
Claiming and locking¶
claim_task() runs inside a worker thread. It uses an atomic UPDATE...RETURNING to grab the next pending task, setting status='locked' with the worker ID and timestamp. Before claiming, it runs stale lock cleanup:
- Fail tasks locked > 30 min that are too old to retry
- Release recent stale locks back to
pending - Same for stuck
runningtasks — "stuck" is decided by worker liveness, not a flat runtime (ISSUE-112): arunningtask is reclaimable once itslast_heartbeathas been silent longer thanworker_stuck_minutes(default 10), or, if it never heartbeated, oncestarted_atexceedstask_timeout_minutes+ grace. The running worker pingslast_heartbeateveryworker_heartbeat_seconds, so a slow-but-alive worker (e.g. the in-process native brain) is never reclaimed
Tasks are ordered by priority DESC, created_at ASC. Workers filter by user_id and queue type.
Execution¶
After claiming, the worker immediately updates status to running and closes the DB connection. Everything from here until result processing happens outside any DB transaction to avoid long locks.
Command tasks¶
If the task has a command field (shell scheduled jobs), it runs via _execute_command_task() — through _run_capture (a Popen with start_new_session=True, so a timeout SIGKILLs the whole process group rather than blocking on an orphaned grandchild). Its env is build_stripped_env() plus propagated ISTOTA_* vars, ISTOTA_EXPERIMENTAL_FEATURES, and manifest-derived credential / connection vars resolved by build_skill_env + dispatch_setup_env_hooks. No skill selection, no Claude, no prompt assembly.
Prompt tasks¶
For all other tasks, execute_task() handles the full pipeline:
- Skill selection — single axis:
select_skills()runs deterministic matching (always_include/source_types/file_types/ sticky / companions, minusexclude_skills) to produce the eager set (full body in the prompt). Keyword and resource matching are no longer selectors, and there is no progressive-disclosure partition. Every other eligible skill (eligible_skill_names) becomes a one-line menu entry the model pulls in on demand - Persist selected skills to DB via
save_task_selected_skills() - Load skill docs and resolve env vars
- Context loading (Talk message cache or email thread)
- Memory loading (USER.md, CHANNEL.md, dated memories, recalled memories)
- Prompt assembly (see executor docs for section order)
- Brain execution — the executor builds a
BrainRequestand callsbrain.execute(req). The defaultClaudeCodeBrainspawnsclaude -p - --output-format stream-jsonand parses the stream.NativeBrainruns an in-process agent loop over HTTP against any OpenAI-compatible model;TmuxClaudeBraindrives the interactive Claude TUI in a detached tmux session (not HTTP). See brain. - Result composition (still in executor) —
_compose_full_result(text, trace)handles context-management boundaries and terse-result recovery; both brains produce the same(result_text, execution_trace)shape.
The executor returns (success, result_text, actions_taken_json, execution_trace_json).
Progress updates¶
Progress flows through task-event streaming, not per-consumer callbacks. The executor adapts the brain's StreamEvents into typed TaskEvents and writes them to the task_events log via an EventWriter. process_one_task subscribes three in-process consumers to that log:
TalkEventSubscriber: edits the ack message in place with rate-limited progressLogChannelSubscriber: accumulating edit of the operator's log-channel messagePushNotificationSubscriber: ntfy on long-running tasks
The web SSE endpoint reads the same task_events table directly (the table is the bus, no IPC). The old _make_talk_progress_callback / _make_log_channel_callback / _composite_callback chain is gone.
Result processing¶
Back in the scheduler, process_one_task() handles the result inside a DB transaction:
Success path¶
- API error guard: detect API errors masquerading as success (exit 0 with error text)
- Malformed output guard: detect leaked tool-call XML — reclassify as failure
- Confirmation check: regex match for confirmation requests →
pending_confirmation - Update to
completed: stores result, actions_taken, execution_trace - Memory search indexing: index conversation under user and channel namespaces
- Delivery routing:
transport.routing.resolve_delivery_planturnsoutput_targetinto an ordered, channel-resolved destination list (Talk, email, ntfy, TASKS.md write-back, or stream surfaces web/REPL). Stream destinations need no push — thetask_eventslog is the delivery
Failure path¶
- Check if task was cancelled by user (
!stopcommand) - Retry with exponential backoff (1, 4, 16 min) if attempts remain
- Mark permanently
failedaftermax_attempts(default 3) - Track scheduled job consecutive failures; auto-disable after threshold
Post-completion¶
After the DB transaction closes:
- Deferred operations: process JSON files from the sandbox temp dir (subtasks, transaction tracking, sent emails, KV ops, KG ops, health ops, user alerts, email output)
- Briefing digest: save for next-run deduplication
- Talk progress finalize: edit ack message with final summary
- Log channel finalize: edit/post completion message with skills and tool summary
- Result delivery: fan out to every push destination in the resolved plan (Talk, email, ntfy, TASKS.md); stream surfaces (web, REPL) deliver via the
task_eventsSSE log
Log channel messages¶
When a user has log_channel configured, each task gets a log channel entry showing:
**[#12345]** ✅ Done (3 actions) - #channel-name
Skills: calendar, email, files, memory, sensitive_actions
📅 Listed calendar events
📧 Sent email reply
📄 Read USER.md
The skills line is populated by reading selected_skills from the DB after task completion. Controlled by log_channel_show_skills (default: true) in the [scheduler] config section.
Data flow gotchas¶
Column visibility in get_task()¶
The get_task() function uses an explicit column list in its SELECT, not SELECT *. When adding new columns to the tasks table, you must update three places:
- The
ALTER TABLEmigration in_run_migrations() - The
_row_to_task()mapping (within row.keys()fallback) - The SELECT column list in
get_task()— easy to forget, and_row_to_tasksilently falls back toNone
Skills are saved before execution, read after¶
save_task_selected_skills() runs early in execute_task(), before the Claude subprocess launches. The log channel finalize reads them back from the DB after the task completes. Any code path that clears or overwrites the row between those points would lose the skills data.
DB connections are short-lived¶
The scheduler opens and closes DB connections for each phase (claim, execute, result processing, finalize). This is intentional — long-held connections would block other workers via SQLite's write lock. Each with db.get_db() block is a separate transaction.
Command tasks skip the executor¶
Shell command tasks (task.command is set) bypass execute_task() entirely. They have no skill selection, no prompt, no streaming. Their log channel entries will never show skills.