Guardrails
Guardrails prevent runaway agents by enforcing per-run limits, session budgets, daemon budgets, and autonomous budgets. All limits are enforced automatically — agents stop when a limit is hit and warn at 80% consumption.
Quick Example
guardrails:
max_tokens_per_run: 50000
max_tool_calls: 20
timeout_seconds: 300
session_token_budget: 200000
run_token_budget: 80000 # cumulative budget for one CLI invocation, including delegations (since v2026.5.1)
# Team mode guardrails (kind: Team only)
team_token_budget: 150000 # cumulative budget across all personas
team_timeout_seconds: 900 # wall-clock limit for entire team run
# Daemon resilience (since v2026.4.11)
retry_policy:
max_attempts: 3
backoff_base_seconds: 2.0
backoff_max_seconds: 30.0
circuit_breaker:
failure_threshold: 5
reset_timeout_seconds: 60Per-Run Limits
These limits apply to each individual agent run (a single invocation or trigger execution).
| Field | Type | Default | Description |
|---|---|---|---|
max_tokens_per_run | int | 50000 | Maximum output tokens consumed per agent run |
max_tool_calls | int | 20 | Maximum tool invocations per run |
timeout_seconds | int | 300 | Wall-clock timeout per run (seconds) |
max_request_limit | int | null | auto | Maximum LLM API round-trips per run. Auto-derived as max(max_tool_calls + 10, 30) when not set |
input_tokens_limit | int | null | null | Per-request input token limit |
total_tokens_limit | int | null | null | Per-request combined input+output token limit |
run_token_budget | int | null | null | Cumulative token budget for a single one-shot CLI run; counts the parent run plus completed inline-delegated sub-runs. Override per-invocation with --token-budget N. Since v2026.5.1. |
The per-call limits (max_tokens_per_run, total_tokens_limit, input_tokens_limit, max_request_limit) map to PydanticAI's UsageLimits and bound a single LLM round-trip or a single top-level agent.run. They do not see tokens spent inside delegated sub-agents. run_token_budget is the cumulative cap across the whole invocation, including delegations. See run_token_budget semantics below.
run_token_budget semantics
run_token_budget is a cumulative-completed-run guard with best-effort hard-stop, not a live token meter. Available since v2026.5.1.
- It is checked once before the parent run starts (so a previous over-budget invocation in the same process can short-circuit) and again before every inline delegate sub-run.
- It records actual usage after the parent run and after each completed sub-run. PydanticAI only exposes per-
agent.runusage when the run finishes. - It will stop a cascading delegate chain the moment the cumulative count crosses the cap.
- It does not abort a single runaway parent mid-stream when the parent never delegates. In that case the per-call limits (
max_tokens_per_run,total_tokens_limit) remain the relevant guard.
run_token_budget does not apply to --autonomous runs (use autonomous_token_budget for those) or to daemon mode (use the daemon_* budgets).
Session Budgets
guardrails:
session_token_budget: 500000session_token_budget tracks cumulative token usage across interactive REPL turns (-i mode). The agent warns at 80% consumption and stops accepting new prompts at 100%.
This is useful for long-running interactive sessions where you want to cap total spend.
Daemon Budgets
Daemon-mode agents (initrunner run --daemon) can have lifetime and daily budgets:
| Field | Type | Default | Description |
|---|---|---|---|
daemon_token_budget | int | null | null | Lifetime token budget for the daemon process |
daemon_daily_token_budget | int | null | null | Daily token budget, resets at midnight in budget_timezone |
guardrails:
daemon_token_budget: 1000000
daemon_daily_token_budget: 100000When a daemon budget is exhausted, triggers are skipped until the budget resets (daily) or the daemon is restarted (lifetime).
USD Cost Budgets
Daemon-mode agents can also enforce USD-based cost limits alongside token budgets. Cost is estimated per run using the genai-prices library.
| Field | Type | Default | Description |
|---|---|---|---|
daemon_daily_cost_budget | float | null | null | Maximum USD spend per calendar day |
daemon_weekly_cost_budget | float | null | null | Maximum USD spend per ISO week |
budget_timezone | str | "UTC" | IANA timezone for daily/weekly budget resets (e.g. "America/New_York") |
guardrails:
daemon_daily_cost_budget: 10.00
daemon_weekly_cost_budget: 50.00
budget_timezone: "America/New_York" # resets at midnight EasternDaily cost resets at midnight in the configured budget_timezone (UTC by default). Weekly cost resets when the ISO week number changes. You can also override the timezone from the CLI:
initrunner run role.yaml --daemon --budget-timezone America/New_YorkWhen a cost budget is exhausted, triggers are skipped just like token budgets. At startup, InitRunner validates that pricing data is available for the role's model. If genai-prices doesn't cover the model, the daemon exits with a clear error.
Budget counters are persisted to the audit database after each run. Restarting a daemon or bot restores the counters, so spend tracking survives process restarts (since v2026.4.11).
Token and cost budgets are enforced independently; either limit being hit will pause the daemon. See Cost Tracking for CLI analytics and dashboard UI.
Daemon Resilience
Since v2026.4.11, daemon-mode agents can retry failed runs and track provider health with a circuit breaker. Both features live under spec.guardrails.
Retry Policy
When a trigger fires and the agent run fails with a transient provider error (rate limit, 5xx, connection failure), the daemon retries the entire run with exponential backoff.
| Field | Type | Default | Range | Description |
|---|---|---|---|---|
retry_policy.max_attempts | int | 1 | 1-5 | Total attempts per trigger fire (1 = no retry) |
retry_policy.backoff_base_seconds | float | 2.0 | 0.5-30 | Base delay for exponential backoff |
retry_policy.backoff_max_seconds | float | 30.0 | 1-300 | Maximum backoff delay |
Only transient provider errors are retried: HTTP 429 (rate limit), HTTP 5xx (server error), and connection failures. Timeouts, auth errors, content blocks, and usage limits are not retried.
Side effects: retries re-execute the entire agent run, including tool calls. Only enable retry for idempotent roles or when failures happen before tool execution (provider-level errors).
guardrails:
retry_policy:
max_attempts: 3
backoff_base_seconds: 2.0
backoff_max_seconds: 30.0Circuit Breaker
The circuit breaker tracks provider health across trigger fires. After enough consecutive failures, it stops dispatching new runs until the provider recovers.
| Field | Type | Default | Range | Description |
|---|---|---|---|---|
circuit_breaker.failure_threshold | int | 5 | 1-100 | Consecutive failures before the circuit opens |
circuit_breaker.reset_timeout_seconds | int | 60 | 10-3600 | Seconds before a half-open probe |
State machine: CLOSED (normal) -> OPEN (all runs skipped) after hitting the failure threshold -> HALF_OPEN (one probe allowed) after the reset timeout -> back to CLOSED on success or OPEN again on failure.
Only provider-health errors trip the breaker: rate limits, server errors, connection failures, and auth errors (401/403). Application-level errors like content blocks and usage limits are ignored.
State transitions are logged as security audit events (circuit_open, circuit_half_open, circuit_closed).
guardrails:
circuit_breaker:
failure_threshold: 5
reset_timeout_seconds: 60Set circuit_breaker: null (the default) to disable.
Autonomous Limits
These fields control resource usage for autonomous mode runs:
| Field | Type | Default | Description |
|---|---|---|---|
max_iterations | int | 10 | Maximum plan-execute-adapt cycles |
autonomous_token_budget | int | null | null | Token budget for the autonomous run |
autonomous_timeout_seconds | int | null | null | Wall-clock timeout for the entire autonomous run |
guardrails:
max_iterations: 10
autonomous_token_budget: 50000
autonomous_timeout_seconds: 600When any autonomous limit is hit, the agent stops and reports its progress via finish_task.
Team Budgets
These fields control resource usage for team mode runs (kind: Team):
| Field | Type | Default | Description |
|---|---|---|---|
team_token_budget | int | null | Cumulative token budget across all personas in a team run. Pipeline stops if exceeded. Team mode only. |
team_timeout_seconds | int | null | Wall-clock limit for entire team run. Pipeline stops if exceeded. Team mode only. |
guardrails:
team_token_budget: 150000
team_timeout_seconds: 900Team budgets protect team runs from unbounded spend across personas. Per-run limits (max_tokens_per_run, timeout_seconds) still apply to each individual persona. See Team Mode.
Enforcement Behavior
Each limit type has specific enforcement behavior:
| Limit | What Happens |
|---|---|
max_tokens_per_run | PydanticAI raises UsageLimitExceeded — the run stops immediately |
max_tool_calls | PydanticAI raises UsageLimitExceeded — the run stops immediately |
timeout_seconds | Python raises TimeoutError — the run is cancelled |
max_request_limit | PydanticAI raises UsageLimitExceeded — no more API round-trips |
input_tokens_limit | PydanticAI raises UsageLimitExceeded on the next request |
total_tokens_limit | PydanticAI raises UsageLimitExceeded on the next request |
session_token_budget | Warns at 80%, stops accepting prompts at 100% |
daemon_token_budget | Triggers are skipped when exhausted |
daemon_daily_token_budget | Triggers are skipped until UTC midnight reset |
daemon_daily_cost_budget | Triggers are skipped until midnight reset (in budget_timezone) |
daemon_weekly_cost_budget | Triggers are skipped until ISO week rolls over (in budget_timezone) |
retry_policy | Failed run is retried with exponential backoff (transient errors only) |
circuit_breaker | All trigger runs are skipped while circuit is open |
max_iterations | Autonomous loop terminates, agent reports progress |
autonomous_token_budget | Autonomous loop terminates, agent reports progress |
autonomous_timeout_seconds | Autonomous loop terminates, agent reports progress |
team_token_budget | Team pipeline stops, partial results returned |
team_timeout_seconds | Team pipeline stops, partial results returned |
Budget warnings apply to session_token_budget, daemon_token_budget, daemon_daily_token_budget, daemon_daily_cost_budget, and daemon_weekly_cost_budget. Warnings are logged at 80% and 95% consumption so operators can take action before the hard stop.
Visibility
Guardrail status is surfaced across multiple interfaces:
| Surface | What's Shown |
|---|---|
initrunner validate | Warns if guardrails are missing or misconfigured |
| REPL subtitle | Live token usage and remaining budget |
| Dashboard status bar | Per-run and session budget consumption bars |
| Dashboard API | /api/agents/:id/usage endpoint returns current budget state |
| Audit logs | Every limit hit is recorded with the limit name and value |
Tool Output Limits
Individual tool outputs are capped to prevent a single response from consuming the entire context window:
| Tool | Max Output Size | Behavior When Exceeded |
|---|---|---|
read_file | 1 MB | Output is truncated with a [truncated] marker |
http_request | 100 KB | Response body is truncated; headers are preserved |
shell | 100 KB | stdout/stderr combined output is truncated |
search_documents | 50 KB | Results are truncated; match count is still reported |
These limits are not configurable — they are hard-coded safety rails to protect context window budget. If you need larger outputs, read files in chunks or paginate HTTP responses.
Example Configurations
Cost-Conscious Development
Tight limits for iterative development where you want fast feedback and low spend:
guardrails:
max_tokens_per_run: 10000
max_tool_calls: 10
timeout_seconds: 60
session_token_budget: 50000Production Daemon
A daemon role with daily budgets and autonomous limits:
apiVersion: initrunner/v1
kind: Agent
metadata:
name: monitor-agent
description: Monitors infrastructure and auto-remediates issues
spec:
role: |
You are an infrastructure monitor. Check system health when triggered,
diagnose issues, and apply standard remediations.
model:
provider: openai
name: gpt-4o-mini
temperature: 0.0
tools:
- type: shell
allowed_commands: [curl, systemctl, journalctl]
require_confirmation: false
timeout_seconds: 30
triggers:
- type: cron
schedule: "*/5 * * * *"
prompt: "Run a health check on all services."
autonomous: true
autonomy:
max_plan_steps: 8
max_history_messages: 20
iteration_delay_seconds: 2
guardrails:
# Per-run limits
max_tokens_per_run: 15000
max_tool_calls: 10
timeout_seconds: 120
# Daemon budgets
daemon_token_budget: 5000000
daemon_daily_token_budget: 500000
# Cost budgets
daemon_daily_cost_budget: 10.00
daemon_weekly_cost_budget: 50.00
budget_timezone: "UTC"
# Daemon resilience
retry_policy:
max_attempts: 3
backoff_base_seconds: 2.0
backoff_max_seconds: 30.0
circuit_breaker:
failure_threshold: 5
reset_timeout_seconds: 60
# Autonomous limits
max_iterations: 5
autonomous_token_budget: 30000
autonomous_timeout_seconds: 300RAG with Budget
A knowledge-base agent with session budgets to cap interactive usage:
guardrails:
max_tokens_per_run: 30000
max_tool_calls: 15
timeout_seconds: 180
session_token_budget: 200000
input_tokens_limit: 16000CLI Overrides
# Override max iterations for autonomous mode
initrunner run role.yaml -a --max-iterations 5
# Override the per-run cumulative token budget for one invocation
initrunner run role.yaml --token-budget 80000The --max-iterations N flag overrides the max_iterations value from the YAML file for that run. The --token-budget N flag (since v2026.5.1) overrides guardrails.run_token_budget for that invocation; it caps the parent run plus any inline-delegated sub-agents.