Ollama & Local Models
InitRunner supports running agents against local LLMs served by Ollama or any OpenAI-compatible endpoint (vLLM, LiteLLM, llama.cpp server, etc.). This requires zero additional dependencies — it reuses the openai SDK already bundled with the core install.
Quick Start
- Install and start Ollama:
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
ollama serve- Pull a model:
ollama pull llama3.2- Scaffold a role:
initrunner init --template ollama --name my-local-agent --model llama3.2- Run the agent:
initrunner run role.yaml -iHow It Works
Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1. When provider: ollama is set (or a base_url is specified), InitRunner constructs a PydanticAI OpenAIProvider with that endpoint instead of calling the real OpenAI API. A dummy API key ("ollama") is set automatically so the SDK doesn't look for OPENAI_API_KEY in the environment.
Configuration
Minimal Ollama Role
apiVersion: initrunner/v1
kind: Agent
metadata:
name: local-agent
description: Agent using local Ollama model
spec:
role: |
You are a helpful assistant.
model:
provider: ollama
name: llama3.2 # Run: ollama pull llama3.2Model Config Reference
spec:
model:
provider: ollama # required — triggers local model setup
name: llama3.2 # required — model name as known to Ollama
base_url: http://localhost:11434/v1 # default for ollama; override for remote
temperature: 0.1 # default: 0.1
max_tokens: 4096 # default: 4096| Field | Type | Default | Description |
|---|---|---|---|
provider | str | — | Set to "ollama" for local Ollama models |
name | str | — | Model name (e.g. llama3.2, mistral, codellama) |
base_url | str | null | null | Custom endpoint URL. Defaults to http://localhost:11434/v1 when provider is ollama. |
temperature | float | 0.1 | Sampling temperature (0.0–2.0) |
max_tokens | int | 4096 | Maximum tokens per response (1–128000) |
Custom OpenAI-Compatible Endpoints
The base_url field works with any provider, not just Ollama. Use it to point at vLLM, LiteLLM, llama.cpp, or any other server that exposes an OpenAI-compatible API:
spec:
model:
provider: openai
name: my-model
base_url: http://my-server:8000/v1When base_url is set on a non-ollama provider, the API key is set to "custom-provider" to avoid environment variable lookups. If your endpoint requires authentication, set OPENAI_API_KEY in the environment and omit base_url (use the standard openai provider flow).
Embeddings
Ollama also serves embeddings. When using ingestion or memory with Ollama, configure the embedding model in the embeddings section:
spec:
model:
provider: ollama
name: llama3.2
ingest:
sources:
- "./docs/**/*.md"
embeddings:
provider: ollama
model: nomic-embed-text # Run: ollama pull nomic-embed-text
# base_url: http://localhost:11434/v1 # defaultEmbedding Config Reference
| Field | Type | Default | Description |
|---|---|---|---|
provider | str | "" | Embedding provider. Set to "ollama" for local embeddings. Empty inherits from spec.model.provider. |
model | str | "" | Embedding model name. Empty uses provider default (nomic-embed-text for Ollama). |
base_url | str | "" | Custom endpoint URL. Defaults to http://localhost:11434/v1 when provider is ollama. |
api_key_env | str | "" | Env var name holding the embedding API key. Not needed for Ollama. |
Default Embedding Models
| Provider | Default Model |
|---|---|
openai | text-embedding-3-small |
ollama | nomic-embed-text |
google | text-embedding-004 |
anthropic | text-embedding-3-small (uses OpenAI) |
Example: Local RAG Agent
Full local RAG stack — no external API calls or API keys:
apiVersion: initrunner/v1
kind: Agent
metadata:
name: local-rag
description: Local RAG agent with Ollama
tags:
- rag
- ollama
spec:
role: |
You are a knowledge assistant. Use search_documents to find relevant
content before answering. Always cite your sources.
model:
provider: ollama
name: llama3.2
ingest:
sources:
- "./docs/**/*.md"
- "./docs/**/*.txt"
chunking:
strategy: fixed
chunk_size: 512
chunk_overlap: 50
embeddings:
provider: ollama
model: nomic-embed-textollama pull llama3.2
ollama pull nomic-embed-text
initrunner ingest role.yaml
initrunner run role.yaml -iExample: Memory Agent
Long-term memory works fully offline with Ollama:
apiVersion: initrunner/v1
kind: Agent
metadata:
name: local-memory
description: Local agent with memory
spec:
role: |
You are a helpful assistant with long-term memory.
Use remember() to save important information.
Use recall() to search your memories.
model:
provider: ollama
name: llama3.2
memory:
max_sessions: 10
max_memories: 1000
embeddings:
provider: ollama
model: nomic-embed-textDocker
When running InitRunner inside Docker, localhost won't reach the host machine. Use host.docker.internal instead:
spec:
model:
provider: ollama
name: llama3.2
base_url: http://host.docker.internal:11434/v1InitRunner automatically detects Docker environments (via /.dockerenv) and logs a warning if base_url contains localhost or 127.0.0.1.
Alternatively, run Ollama in the same Docker network:
# docker-compose.yml
services:
ollama:
image: ollama/ollama
ports:
- "11434:11434"
agent:
build: .
environment:
- OLLAMA_HOST=http://ollama:11434/v1spec:
model:
provider: ollama
name: llama3.2
base_url: http://ollama:11434/v1CLI
Scaffold an Ollama Role
initrunner init --template ollama --name my-agent --model mistralThis generates a role.yaml pre-configured for provider: ollama with the specified model (or llama3.2 by default). After scaffolding, InitRunner pings http://localhost:11434/api/tags and prints a warning if Ollama is not reachable.
Available Templates
Any template works with --provider ollama:
initrunner init --template basic --provider ollama --model codellama
initrunner init --template rag --provider ollama --model llama3.2
initrunner init --template memory --provider ollama
initrunner init --template daemon --provider ollama
initrunner init --template ollama # dedicated template with Ollama-specific commentsTroubleshooting
"Ollama does not appear to be running"
Start the Ollama server:
ollama serveOn macOS, you can also launch the Ollama desktop app.
Connection refused at runtime
Verify Ollama is running and accessible:
curl http://localhost:11434/api/tagsIf using a remote Ollama instance, set base_url explicitly:
spec:
model:
provider: ollama
name: llama3.2
base_url: http://remote-host:11434/v1Model not found
Pull the model before running:
ollama pull llama3.2List available models:
ollama listSlow responses
Local models are limited by your hardware. Tips:
- Use smaller models (
llama3.23B is faster thanllama3.170B) - Increase
timeout_secondsin guardrails for larger models - Use GPU acceleration (Ollama auto-detects CUDA/Metal)
EmbeddingModelChangedError on ingestion
You switched embedding models. The CLI will prompt you to confirm wiping the store and re-ingesting. To skip the prompt, use --force:
initrunner ingest role.yaml --forcePopular Ollama Models
| Model | Size | Good For |
|---|---|---|
llama3.2 | 3B | General purpose, fast |
llama3.1 | 8B/70B | Higher quality, slower |
mistral | 7B | Balanced performance |
codellama | 7B/13B | Code generation |
nomic-embed-text | 137M | Embeddings (for RAG/memory) |
mxbai-embed-large | 335M | Higher-quality embeddings |