RAG Patterns & Guide

This guide covers practical patterns for using InitRunner's retrieval-augmented generation (RAG) capabilities. For full configuration reference, see Ingestion and Memory.

RAG vs Memory: When to Use Which

InitRunner has two systems for giving agents access to information beyond their training data:

Aspect	Ingestion (RAG)	Memory
Purpose	Search external documents	Remember learned information
Data source	Files on disk, URLs	Agent's own observations
Who writes	You (via `initrunner ingest`)	Agent (via `remember()` tool)
Who reads	Agent (via `search_documents()`)	Agent (via `recall()`)
Best for	Knowledge base Q&A, doc search	Personalization, context carry-over
Persistence	Rebuilt on each `ingest` run	Accumulates across sessions

You can use both together: ingestion for your docs, memory for user preferences.

spec:
  ingest:
    sources:
      - "./docs/**/*.md"
  memory:
    semantic:
      max_memories: 500

End-to-End Walkthrough

1. Create a role with ingestion

Create role.yaml:

apiVersion: initrunner/v1
kind: Agent
metadata:
  name: docs-agent
  description: Documentation Q&A agent
spec:
  role: |
    You are a documentation assistant. ALWAYS call search_documents
    before answering questions. Cite your sources.
  model:
    provider: openai
    name: gpt-4o-mini
  ingest:
    sources:
      - "./docs/**/*.md"
    chunking:
      strategy: paragraph
      chunk_size: 512
      chunk_overlap: 50

2. Add some documents

Create a docs/ directory with markdown files:

docs/
├── getting-started.md
├── api-reference.md
└── faq.md

3. Ingest documents

$ initrunner ingest role.yaml
Ingesting documents for docs-agent...
✓ Stored 47 chunks from 3 files

4. Run the agent

$ initrunner run role.yaml -p "How do I authenticate?"

The agent calls search_documents("authenticate") behind the scenes, retrieves matching chunks from your docs, and uses them to answer.

5. Interactive session

$ initrunner run role.yaml -i
docs-agent> How do I get an API key?

I found the answer in your documentation. Per the Getting Started guide
(./docs/getting-started.md), you can generate an API key by navigating to
Settings > API Keys in your dashboard...

docs-agent> What rate limits apply?

According to the API Reference (./docs/api-reference.md), the default rate
limit is 100 requests per minute per API key...

Choosing an Embedding Model

The embedding model determines how well semantic search performs. Different models trade off between dimension size, cost, speed, and quality.

Model	Provider	Dimensions	Notes
`text-embedding-3-small`	OpenAI	1536	Fast and cheap, a good default for most use cases
`text-embedding-3-large`	OpenAI	3072	Higher quality at higher cost
`text-embedding-004`	Google	768	Cost-effective; strong multilingual support
`nomic-embed-text`	Ollama	768	Fully local, no API key or network needed
`BAAI/bge-small-en-v1.5`	`local` (fastembed)	384	Runs in-process, no HTTP hop, no API key; needs the `local-embeddings` extra

Which model should I use?

Cost-sensitive: Google text-embedding-004 or Ollama nomic-embed-text
Precision-critical: OpenAI text-embedding-3-large
Fully local / no API keys: Ollama nomic-embed-text
Truly offline / no external API: the local provider (fastembed) runs the model in-process. The default BAAI/bge-small-en-v1.5 (384 dims) is a good start; BAAI/bge-base-en-v1.5 (768 dims) is higher quality but needs a fresh store_path, since changing the embedding dimension is not backward compatible.
Google ecosystem: Google text-embedding-004

The default (openai:text-embedding-3-small) is a sensible starting point for most projects. See Providers for the full embedding configuration reference and how to override the default.

Common Patterns

Basic knowledge base

Single format, paragraph chunking for natural document boundaries:

ingest:
  sources:
    - "./knowledge-base/**/*.md"
  chunking:
    strategy: paragraph
    chunk_size: 512
    chunk_overlap: 50

Multi-format knowledge base

Mix HTML, Markdown, and PDF sources. Install initrunner[ingest] for PDF support:

ingest:
  sources:
    - "./docs/**/*.md"
    - "./docs/**/*.html"
    - "./docs/**/*.pdf"
  chunking:
    strategy: fixed
    chunk_size: 1024
    chunk_overlap: 100

URL-based ingestion

Ingest content from remote URLs alongside local files:

ingest:
  sources:
    - "./local-docs/**/*.md"
    - "https://docs.example.com/api/reference"
    - "https://docs.example.com/changelog"

URL content is hashed, so re-running ingest skips unchanged pages.

Running on source changes with a file watch trigger

Since v2026.4.10, source changes are detected on every initrunner run automatically, so you don't need a trigger just to keep the index fresh. Reach for a file_watch trigger when you want the agent to actually run on change (for example, to summarize the edit or notify a channel), not just re-ingest:

spec:
  ingest:
    sources:
      - "./knowledge-base/**/*.md"
  triggers:
    - type: file_watch
      paths:
        - ./knowledge-base
      extensions:
        - .md
      prompt_template: "Knowledge base updated: {path}. Re-index."
      debounce_seconds: 1.0

Using `source` filter to scope searches

When your knowledge base spans multiple topics, use the source parameter to narrow results:

spec:
  role: |
    You are a support agent. When the user asks about billing, search
    only billing docs: search_documents(query, source="*billing*").
    For technical issues, search: search_documents(query, source="*troubleshooting*").
  ingest:
    sources:
      - "./kb/billing/**/*.md"
      - "./kb/troubleshooting/**/*.md"
      - "./kb/general/**/*.md"

Hybrid retrieval (vector + keyword)

Dense vector search matches on meaning, so it can miss exact tokens like identifiers, error codes, version strings, and acronyms that do not have a strong semantic signal. Hybrid retrieval runs both a dense vector search and a BM25 full-text search, then fuses the two result lists with reciprocal rank fusion (RRF).

Set the strategy on spec.ingest.retriever:

spec:
  ingest:
    sources:
      - "./docs/**/*.md"
    retriever:
      strategy: hybrid

The three strategies:

Strategy	What it does	Extra dependency
`vector`	Dense cosine search only. The default, unchanged behaviour.	None
`hybrid`	RRF fusion of dense vector and BM25 full-text results.	None (RRF ships with LanceDB)
`hybrid_rerank`	Hybrid fusion, then a cross-encoder reranks the fused results.	Optional (see below)

The BM25 full-text index is built automatically on the next initrunner ingest, so an existing store picks up hybrid search after a re-ingest. No new required dependency is added for vector or hybrid.

hybrid_rerank adds a cross-encoder pass on top of hybrid for higher precision. The cross-encoder backend (sentence-transformers) is optional. When it is not installed, hybrid_rerank falls back to plain hybrid (RRF) instead of failing. Install it with:

$ uv pip install sentence-transformers

Tunable retriever config with verified defaults:

spec:
  ingest:
    sources:
      - "./docs/**/*.md"
    retriever:
      strategy: hybrid_rerank
      reranker_model: cross-encoder/ms-marco-MiniLM-L-6-v2
      rrf_k: 60

An agent can override the configured mode for a single call with the strategy parameter on the search tool: search_documents(query, strategy="hybrid"). The accepted values are the same three: vector, hybrid, and hybrid_rerank.

Fully local RAG with Ollama

No external API keys needed. Use Ollama for both the LLM and embeddings:

spec:
  model:
    provider: ollama
    name: llama3.2
  ingest:
    sources:
      - "./docs/**/*.md"
    embeddings:
      provider: ollama
      model: nomic-embed-text

See the Providers page for Ollama setup instructions.

Next Steps

Ingestion reference: full configuration options, chunking strategies, embedding models
Memory reference: session persistence and long-term memory (semantic, episodic, procedural)
Tools reference: built-in and custom tool types

On this page