RAG Patterns & Guide
This guide covers practical patterns for using InitRunner's retrieval-augmented generation (RAG) capabilities. For full configuration reference, see Ingestion and Memory.
RAG vs Memory: When to Use Which
InitRunner has two systems for giving agents access to information beyond their training data:
| Aspect | Ingestion (RAG) | Memory |
|---|---|---|
| Purpose | Search external documents | Remember learned information |
| Data source | Files on disk, URLs | Agent's own observations |
| Who writes | You (via initrunner ingest) | Agent (via remember() tool) |
| Who reads | Agent (via search_documents()) | Agent (via recall()) |
| Best for | Knowledge base Q&A, doc search | Personalization, context carry-over |
| Persistence | Rebuilt on each ingest run | Accumulates across sessions |
You can use both together: ingestion for your docs, memory for user preferences.
spec:
ingest:
sources:
- "./docs/**/*.md"
memory:
semantic:
max_memories: 500End-to-End Walkthrough
1. Create a role with ingestion
Create role.yaml:
apiVersion: initrunner/v1
kind: Agent
metadata:
name: docs-agent
description: Documentation Q&A agent
spec:
role: |
You are a documentation assistant. ALWAYS call search_documents
before answering questions. Cite your sources.
model:
provider: openai
name: gpt-4o-mini
ingest:
sources:
- "./docs/**/*.md"
chunking:
strategy: paragraph
chunk_size: 512
chunk_overlap: 502. Add some documents
Create a docs/ directory with markdown files:
docs/
├── getting-started.md
├── api-reference.md
└── faq.md3. Ingest documents
$ initrunner ingest role.yaml
Ingesting documents for docs-agent...
✓ Stored 47 chunks from 3 files4. Run the agent
$ initrunner run role.yaml -p "How do I authenticate?"The agent calls search_documents("authenticate") behind the scenes, retrieves matching chunks from your docs, and uses them to answer.
5. Interactive session
$ initrunner run role.yaml -i
docs-agent> How do I get an API key?
I found the answer in your documentation. Per the Getting Started guide
(./docs/getting-started.md), you can generate an API key by navigating to
Settings > API Keys in your dashboard...
docs-agent> What rate limits apply?
According to the API Reference (./docs/api-reference.md), the default rate
limit is 100 requests per minute per API key...Choosing an Embedding Model
The embedding model determines how well semantic search performs. Different models trade off between dimension size, cost, speed, and quality.
| Model | Provider | Dimensions | Notes |
|---|---|---|---|
text-embedding-3-small | OpenAI | 1536 | Fast and cheap, a good default for most use cases |
text-embedding-3-large | OpenAI | 3072 | Higher quality at higher cost |
text-embedding-004 | 768 | Cost-effective; strong multilingual support | |
nomic-embed-text | Ollama | 768 | Fully local, no API key or network needed |
BAAI/bge-small-en-v1.5 | local (fastembed) | 384 | Runs in-process, no HTTP hop, no API key; needs the local-embeddings extra |
Which model should I use?
- Cost-sensitive: Google
text-embedding-004or Ollamanomic-embed-text - Precision-critical: OpenAI
text-embedding-3-large - Fully local / no API keys: Ollama
nomic-embed-text - Truly offline / no external API: the
localprovider (fastembed) runs the model in-process. The defaultBAAI/bge-small-en-v1.5(384 dims) is a good start;BAAI/bge-base-en-v1.5(768 dims) is higher quality but needs a freshstore_path, since changing the embedding dimension is not backward compatible. - Google ecosystem: Google
text-embedding-004
The default (openai:text-embedding-3-small) is a sensible starting point for most projects. See Providers for the full embedding configuration reference and how to override the default.
Common Patterns
Basic knowledge base
Single format, paragraph chunking for natural document boundaries:
ingest:
sources:
- "./knowledge-base/**/*.md"
chunking:
strategy: paragraph
chunk_size: 512
chunk_overlap: 50Multi-format knowledge base
Mix HTML, Markdown, and PDF sources. Install initrunner[ingest] for PDF support:
ingest:
sources:
- "./docs/**/*.md"
- "./docs/**/*.html"
- "./docs/**/*.pdf"
chunking:
strategy: fixed
chunk_size: 1024
chunk_overlap: 100URL-based ingestion
Ingest content from remote URLs alongside local files:
ingest:
sources:
- "./local-docs/**/*.md"
- "https://docs.example.com/api/reference"
- "https://docs.example.com/changelog"URL content is hashed, so re-running ingest skips unchanged pages.
Running on source changes with a file watch trigger
Since v2026.4.10, source changes are detected on every initrunner run automatically, so you don't need a trigger just to keep the index fresh. Reach for a file_watch trigger when you want the agent to actually run on change (for example, to summarize the edit or notify a channel), not just re-ingest:
spec:
ingest:
sources:
- "./knowledge-base/**/*.md"
triggers:
- type: file_watch
paths:
- ./knowledge-base
extensions:
- .md
prompt_template: "Knowledge base updated: {path}. Re-index."
debounce_seconds: 1.0Using source filter to scope searches
When your knowledge base spans multiple topics, use the source parameter to narrow results:
spec:
role: |
You are a support agent. When the user asks about billing, search
only billing docs: search_documents(query, source="*billing*").
For technical issues, search: search_documents(query, source="*troubleshooting*").
ingest:
sources:
- "./kb/billing/**/*.md"
- "./kb/troubleshooting/**/*.md"
- "./kb/general/**/*.md"Hybrid retrieval (vector + keyword)
Dense vector search matches on meaning, so it can miss exact tokens like identifiers, error codes, version strings, and acronyms that do not have a strong semantic signal. Hybrid retrieval runs both a dense vector search and a BM25 full-text search, then fuses the two result lists with reciprocal rank fusion (RRF).
Set the strategy on spec.ingest.retriever:
spec:
ingest:
sources:
- "./docs/**/*.md"
retriever:
strategy: hybridThe three strategies:
| Strategy | What it does | Extra dependency |
|---|---|---|
vector | Dense cosine search only. The default, unchanged behaviour. | None |
hybrid | RRF fusion of dense vector and BM25 full-text results. | None (RRF ships with LanceDB) |
hybrid_rerank | Hybrid fusion, then a cross-encoder reranks the fused results. | Optional (see below) |
The BM25 full-text index is built automatically on the next initrunner ingest, so an existing store picks up hybrid search after a re-ingest. No new required dependency is added for vector or hybrid.
hybrid_rerank adds a cross-encoder pass on top of hybrid for higher precision. The cross-encoder backend (sentence-transformers) is optional. When it is not installed, hybrid_rerank falls back to plain hybrid (RRF) instead of failing. Install it with:
$ uv pip install sentence-transformersTunable retriever config with verified defaults:
spec:
ingest:
sources:
- "./docs/**/*.md"
retriever:
strategy: hybrid_rerank
reranker_model: cross-encoder/ms-marco-MiniLM-L-6-v2
rrf_k: 60An agent can override the configured mode for a single call with the strategy parameter on the search tool: search_documents(query, strategy="hybrid"). The accepted values are the same three: vector, hybrid, and hybrid_rerank.
Fully local RAG with Ollama
No external API keys needed. Use Ollama for both the LLM and embeddings:
spec:
model:
provider: ollama
name: llama3.2
ingest:
sources:
- "./docs/**/*.md"
embeddings:
provider: ollama
model: nomic-embed-textSee the Providers page for Ollama setup instructions.
Next Steps
- Ingestion reference: full configuration options, chunking strategies, embedding models
- Memory reference: session persistence and long-term memory (semantic, episodic, procedural)
- Tools reference: built-in and custom tool types