Autonomous mode lets an agent plan its own work, execute steps, adapt when things go wrong, and signal completion — all without human input. It's enabled by the spec.autonomy section and the -a CLI flag.
Reasoning strategies (react, todo_driven, plan_execute, reflexion) orchestrate agent behavior across autonomous turns. See Reasoning Primitives for the full guide.
An autonomous agent follows a plan-execute-adapt loop:
Plan — The agent creates a structured todo list using the todo tool
Execute — It works through each step using its tools
Adapt — If a step fails, the agent modifies its plan (add retries, skip, investigate)
Finish — The agent calls finish_task — or the loop auto-completes when all todo items reach terminal status
The finish_task tool is auto-registered when autonomy is enabled. Task tracking comes from the todo tool — add type: todo to spec.tools for structured planning:
Tool
Source
Description
finish_task(status, summary)
Auto-registered
Signal task completion with an overall status and summary
Each autonomous run follows a precise iteration sequence:
Iteration 1 — The agent receives the user prompt plus the system prompt. It creates its initial todo list, then begins executing the first step.
Iterations 2+ — The continuation_prompt is injected with the current ReflectionState (todo progress, completed items, failures). The active reasoning strategy shapes these continuation prompts. The agent continues executing, adapting, or re-planning.
Budget visibility — Each continuation prompt includes a BUDGET block showing remaining resources:
This gives the agent awareness of its resource constraints at every turn, enabling it to prioritize critical tasks and call finish_task before hitting a hard limit. Fields are omitted when no budget is configured for that dimension.
History trimming and compaction — When conversation messages exceed max_history_messages, the oldest messages are dropped (keeping the system prompt and the most recent messages). Alternatively, enable history compaction to LLM-summarize old messages before trimming, preserving key context. This prevents context window exhaustion on long runs.
Budget check — Before each iteration, the runner checks autonomous_token_budget, max_iterations, and autonomous_timeout_seconds. If any limit is reached, the loop terminates.
Terminal conditions — The loop ends when:
The agent calls finish_task (status: completed)
Any guardrail limit is hit (status: max_iterations, budget_exceeded, or timeout)
The agent reports it is stuck (status: blocked or failed)
An unrecoverable error occurs (status: error)
Rate limiting — If iteration_delay_seconds is set (> 0), the runner sleeps between iterations to avoid API rate limits.
Result — The final ReflectionState is returned with the terminal status, the plan steps (with their statuses), and the agent's summary.
A complete autonomous agent that verifies deployments:
apiVersion: initrunner/v1kind: Agentmetadata: name: deployment-checker description: Autonomous deployment verification agent tags: [devops, autonomous, deployment]spec: role: | You are a deployment verification agent. When given one or more URLs to check, create a todo list with one item per URL, execute each check, and produce a pass/fail report. Workflow: 1. Use batch_add_todos to create a checklist — one item per URL to verify 2. Use get_next_todo to pick the next item 3. Run curl -sSL -o /dev/null -w "%{http_code} %{time_total}s" for each URL 4. Mark each item completed (2xx) or failed (anything else) via update_todo 5. If a check fails, add a retry item with add_todo 6. When done, send a Slack summary with pass/fail results per URL 7. Call finish_task with the overall status model: provider: openai name: gpt-5-mini temperature: 0.0 tools: - type: think - type: todo max_items: 12 - type: shell allowed_commands: - curl require_confirmation: false timeout_seconds: 30 - type: slack webhook_url: "${SLACK_WEBHOOK_URL}" default_channel: "#deployments" username: Deploy Checker icon_emoji: ":white_check_mark:" reasoning: pattern: todo_driven auto_plan: true autonomy: max_plan_steps: 12 max_history_messages: 20 iteration_delay_seconds: 1 max_scheduled_per_run: 1 guardrails: max_iterations: 6 autonomous_token_budget: 30000 max_tokens_per_run: 10000 max_tool_calls: 15 session_token_budget: 100000
initrunner run deployment-checker.yaml -a \ -p "Verify https://api.example.com/health and https://api.example.com/ready"
Long-running autonomous agents can lose important context when older messages are dropped by simple history trimming. History compaction solves this by using an LLM call to summarize older messages before they are trimmed, preserving key decisions, tool results, and open tasks.
Fail-open — if the summarization LLM call fails, the original history is kept and trimming proceeds normally. Errors are logged but never crash the loop.
Threshold-based — compaction only activates when message count exceeds threshold, avoiding unnecessary LLM calls on short runs.
Tail preservation — the tail_messages most recent messages are never summarized, ensuring the agent always has full fidelity on its latest actions.
Model flexibility — use model_override to route summarization to a cheaper or faster model (e.g. gpt-4o-mini) to save tokens on the primary model.
See the long-running-analyst example for a complete configuration using compaction.
Autonomous agents need spending limits since they run without human oversight. These fields in spec.guardrails control resource usage:
Field
Type
Default
Scope
Description
max_iterations
int
10
per-run
Maximum plan-execute-adapt cycles
autonomous_token_budget
int | null
null
per-run
Token budget for the autonomous run
autonomous_timeout_seconds
int | null
null
per-run
Wall-clock timeout for the entire autonomous run
max_tokens_per_run
int
50000
per-iteration
Maximum output tokens consumed per iteration
max_tool_calls
int
20
per-iteration
Maximum tool invocations per iteration
timeout_seconds
int
300
per-iteration
Wall-clock timeout per iteration
max_request_limit
int | null
auto
per-iteration
Maximum LLM API round-trips per iteration. Auto-derived as max(max_tool_calls + 10, 30)
session_token_budget
int | null
null
session
Cumulative token budget for REPL session
daemon_token_budget
int | null
null
daemon
Lifetime token budget for the daemon process
daemon_daily_token_budget
int | null
null
daemon
Daily token budget — resets at UTC midnight
max_scheduled_per_run
int
3
scheduling
Maximum follow-up tasks scheduled per autonomous run
max_scheduled_total
int
50
scheduling
Maximum total scheduled tasks across all runs
When any limit is hit, the agent stops and reports its progress. See Guardrails for full enforcement behavior, daemon budgets, and all available limits.
Autopilot is daemon mode where every trigger runs the full autonomous loop instead of single-shot execution. One flag turns it on:
initrunner run role.yaml --autopilot
A daemon responds. An autopilot thinks, then responds. Someone messages your Telegram bot "find me flights from NYC to London next week." In daemon mode, you get one shot at an answer. In autopilot, the agent searches the web, compares options, checks dates, and sends back something worth reading.
All trigger types support this, including Telegram and Discord.
All existing guardrails apply in autopilot mode: max_iterations, autonomous_token_budget, autonomous_timeout_seconds, max_tool_calls, daemon_token_budget, and daemon_daily_token_budget. The agent stops and reports progress if any limit is hit. See Guardrails for the full list.
Scheduled follow-ups (via schedule_followup / schedule_followup_at) always run in autonomous mode regardless of per-trigger config.
# Enable autonomous modeinitrunner run role.yaml -a -p "Check all endpoints"# Autopilot -- all triggers use the autonomous loopinitrunner run role.yaml --autopilot# Override max iterationsinitrunner run role.yaml -a --max-iterations 3 -p "Quick check"
At each iteration, the agent's current state is captured as a ReflectionState and injected into the continuation prompt. This gives the agent awareness of what it has accomplished and what remains.
ReflectionState contains:
Field
Type
Description
completed
bool
Whether the agent has called finish_task
summary
str
Running summary of progress
status
str
Current status label
todo_list
TodoList
The current todo list tracking task progress
Each todo item has description, status, priority, notes, and depends_on fields. See Tools — Todo for the full status and priority reference.
The reflection state — including the formatted todo list — is rendered and appended to the continuation_prompt at the start of each iteration. The active reasoning strategy may customize how this state is presented.
Autonomous mode integrates with the Memory system for persistence and recall:
Session save (--resume) — When memory is configured and the agent is run with --resume, the conversation history (including plan steps and tool outputs) is saved at the end of the run. The next --resume invocation restores context so the agent can pick up where it left off.
finish_task episodic capture — When the agent calls finish_task, the summary is persisted as an episodic memory with category autonomous_run (if episodic memory is enabled). This allows future runs or other agents to recall past outcomes.
recall tool — If memory is enabled, the recall tool is auto-registered. The agent can search all memory types (semantic, episodic, procedural) for past results, patterns, and decisions. Pass memory_types to filter by type. This is useful for agents that run repeatedly (e.g., via cron triggers) and need to avoid repeating past work.
Consolidation on exit — When consolidation.interval is after_autonomous, consolidation runs automatically after the autonomous loop exits, extracting durable semantic facts from episodic records. See Memory: Consolidation.
Cause: The system prompt doesn't instruct the agent to call finish_task, or the agent gets stuck in an adapt loop creating new steps indefinitely.
Fix: Explicitly instruct the agent to call finish_task in spec.role. Set max_iterations and max_plan_steps to enforce hard stops. The max_iterations terminal status is still considered a successful outcome.
Cause: The autonomous token budget is too small for the task, or the agent is producing verbose tool outputs that consume tokens quickly.
Fix: Increase autonomous_token_budget or reduce per-iteration output by lowering model.max_tokens. Check if shell or HTTP tools are returning large outputs — tool output limits (see Guardrails) apply automatically, but the agent may be making too many calls. Reduce max_tool_calls to limit per-iteration tool usage.
Cause: Scheduled follow-ups (via schedule_followup / schedule_followup_at) are stored in-memory only. When the daemon process restarts, all pending scheduled tasks are lost.
Fix: Use cron triggers for recurring tasks instead of schedule_followup. For critical follow-ups, have the agent write the schedule to a file or external system (e.g., a database) and use a cron trigger to check for pending work.
Cause: The model is responding with text-only messages instead of invoking tools. This typically happens when the system prompt is too vague, or when max_tool_calls is set to 0.
Fix: Verify max_tool_calls is greater than 0. Make the system prompt explicit about which tools to use and when. Add example workflows in spec.role that reference tool names directly.