Testing
InitRunner includes built-in tools for testing agents before deploying them — schema validation, dry-run mode (no API calls), and an eval-style test suite runner.
Validation
Validate a role YAML against the schema without running the agent:
initrunner validate role.yamlThis checks:
- YAML syntax and structure
- Required fields (
apiVersion,kind,metadata.name,spec.role) - Field types and value ranges (e.g.
temperaturebetween 0.0 and 2.0) - Tool configurations (valid types, required fields per type)
- Skill references (file exists, frontmatter is valid)
- Trigger configurations (valid cron expressions, valid paths)
- Security policy structure
Validation exits with code 0 on success and non-zero on failure, making it suitable for CI pipelines.
Dry-Run Mode
Run an agent without making any LLM API calls:
initrunner run role.yaml --dry-run -p "Test prompt"Dry-run mode replaces the configured model with a TestModel that returns deterministic placeholder responses. This lets you verify:
- Tool registration and discovery
- Trigger configuration and startup
- Memory system initialization
- Skill loading and merging
- Guardrail enforcement logic
- Sink configuration
No API keys are required and no tokens are consumed. Use dry-run mode during development to catch configuration errors before spending on API calls.
Test Suites
The initrunner test command runs structured test suites against an agent using an eval framework.
initrunner test role.yaml -s test_suite.yamlTest Suite Format
A test suite is a YAML file defining test cases with inputs and expected outcomes:
name: support-agent-tests
description: Regression tests for the support agent
tests:
- name: answers_product_question
prompt: "What is the return policy?"
assertions:
- type: contains
value: "30 days"
- type: contains
value: "refund"
- name: rejects_off_topic
prompt: "What's the weather like?"
assertions:
- type: not_contains
value: "forecast"
- type: max_tokens
value: 200
- name: uses_search_tool
prompt: "Find articles about shipping delays"
assertions:
- type: tool_called
value: search_documents
- type: contains
value: "shipping"
- name: stays_within_budget
prompt: "Write a comprehensive guide to our product line"
assertions:
- type: max_tokens
value: 4096
- type: max_tool_calls
value: 10Assertion Types
| Type | Description |
|---|---|
contains | Output contains the specified string (case-insensitive) |
not_contains | Output does not contain the specified string |
regex | Output matches the regex pattern |
max_tokens | Output token count is within the limit |
max_tool_calls | Number of tool calls is within the limit |
tool_called | The specified tool was invoked during the run |
tool_not_called | The specified tool was not invoked |
exit_status | Run completed with the expected status (success or error) |
Running Tests
# Run a test suite
initrunner test role.yaml -s test_suite.yaml
# Dry-run tests (no API calls, uses TestModel)
initrunner test role.yaml -s test_suite.yaml --dry-run
# Verbose output
initrunner test role.yaml -s test_suite.yaml -v| Flag | Type | Default | Description |
|---|---|---|---|
-s, --suite | str | (required) | Path to the test suite YAML |
--dry-run | bool | false | Use TestModel instead of real API calls |
-v, --verbose | bool | false | Show full output for each test case |
Test Output
Running suite: support-agent-tests (4 tests)
✓ answers_product_question (1.2s, 340 tokens)
✓ rejects_off_topic (0.8s, 95 tokens)
✓ uses_search_tool (2.1s, 520 tokens)
✗ stays_within_budget
FAIL: max_tokens — expected ≤4096, got 4301
Results: 3 passed, 1 failed (4.1s total)Testing Workflow
A practical workflow for developing and testing agents:
-
Validate — catch schema errors early:
initrunner validate role.yaml -
Dry-run — verify tool registration and config without API calls:
initrunner run role.yaml --dry-run -p "Test prompt" -
Interactive test — manual testing in REPL mode:
initrunner run role.yaml -i -
Suite test — run automated assertions against real model output:
initrunner test role.yaml -s tests/regression.yaml -
CI integration — validate and dry-run in CI, suite tests on schedule:
# In CI pipeline initrunner validate role.yaml initrunner test role.yaml -s tests/smoke.yaml --dry-run