LLM Providers
Koldan integrates with Large Language Models (LLMs) for features such as session history summarization and autotitling. Multiple providers can be configured simultaneously, and each task can reference a specific provider.
Configuration Overview
All LLM settings are under the koldan.llm.* prefix in your koldan server properties file:
koldan:
llm:
providers:
<provider-id>:
type: <openai | ollama | bedrock | gemini | anthropic | vertex-ai>
# provider-specific settings below
session-history-summary:
provider: <provider-id>
model: <optional model override>
session-history-autotitle:
provider: <provider-id>
model: <optional model override>
providers— a map of named provider configurations. Each entry has a unique<provider-id>and atypethat selects the backend.- Task configs (
session-history-summary,session-history-autotitle) reference a provider by its<provider-id>and can optionally override the model.
Supported Providers
OpenAI
Uses the OpenAI Chat API (or any OpenAI-compatible endpoint).
| Property | Description | Default |
|---|---|---|
type |
Must be openai |
— |
openai.api-key |
Required. OpenAI API key. | — |
openai.model |
Model name (e.g., gpt-4, gpt-4o, o3-mini). |
— |
openai.base-url |
Custom API base URL (for proxies or compatible services). | OpenAI default |
openai.temperature |
Sampling temperature (0.0–2.0). | Model default |
openai.timeout |
Request timeout (Duration). | 5m |
openai.service-tier |
OpenAI service tier. | — |
openai.reasoning-effort |
Reasoning effort for supported models (e.g., low, medium, high). |
— |
Example:
koldan:
llm:
providers:
my-openai:
type: openai
openai:
api-key: ${OPENAI_API_KEY}
model: gpt-4o
temperature: 0.3
timeout: 2m
Ollama
Connects to a local or remote Ollama instance.
| Property | Description | Default |
|---|---|---|
type |
Must be ollama |
— |
ollama.base-url |
Ollama server URL. | http://localhost:11434 |
ollama.model |
Model name (e.g., llama2, gemma3:27b). |
— |
ollama.temperature |
Sampling temperature. | Model default |
ollama.timeout |
Request timeout (Duration). | 5m |
Example:
koldan:
llm:
providers:
local-llm:
type: ollama
ollama:
base-url: http://192.168.0.44:31262
model: gemma3:27b
Amazon Bedrock
Supports models such as Anthropic Claude, Amazon Titan, Meta Llama, and others available in your AWS region.
| Property | Description | Default |
|---|---|---|
type |
Must be bedrock |
— |
bedrock.region |
AWS region (e.g., us-east-1, eu-west-1). |
us-east-1 |
bedrock.model |
Required. Bedrock model ID (e.g., anthropic.claude-3-sonnet-20240229-v1:0). |
— |
bedrock.access-key-id |
AWS access key ID. If omitted, the default AWS credential chain is used. | — |
bedrock.secret-access-key |
AWS secret access key. If omitted, the default AWS credential chain is used. | — |
bedrock.temperature |
Sampling temperature. | Model default |
bedrock.max-tokens |
Maximum number of output tokens. | Model default |
bedrock.timeout |
Request timeout (Duration). | 5m |
Authentication
You can provide AWS credentials in two ways:
- Explicit credentials — set
access-key-idandsecret-access-keyin the config (use environment variable references to avoid hardcoding secrets). - Default AWS credential chain — omit the credential properties and let the AWS SDK resolve credentials automatically (e.g., from environment variables
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY, IAM instance profiles, ECS task roles, or~/.aws/credentials).
Example:
koldan:
llm:
providers:
bedrock-llm:
type: bedrock
bedrock:
region: us-east-1
model: anthropic.claude-3-sonnet-20240229-v1:0
access-key-id: ${AWS_ACCESS_KEY_ID}
secret-access-key: ${AWS_SECRET_ACCESS_KEY}
temperature: 0.7
max-tokens: 4096
timeout: 5m
session-history-summary:
provider: bedrock-llm
Google AI Gemini
Google AI Gemini integration.
| Property | Description | Default |
|---|---|---|
type |
Must be gemini |
— |
gemini.api-key |
Required. Google AI API key. | — |
gemini.model |
Required. Model name (e.g., gemini-pro, gemini-2.5-flash). |
— |
gemini.base-url |
Custom API base URL (for proxies or custom endpoints). | Google AI default |
gemini.temperature |
Sampling temperature. | Model default |
gemini.max-output-tokens |
Maximum number of output tokens. | Model default |
gemini.timeout |
Request timeout (Duration). | 5m |
gemini.thinking-config.include-thoughts |
Whether to include thinking/reasoning in the response. | — |
gemini.thinking-config.thinking-budget |
Token budget for the thinking/reasoning phase. | — |
gemini.thinking-config.thinking-level |
Thinking level (e.g., low, medium, high). |
— |
Example:
koldan:
llm:
providers:
gemini-llm:
type: gemini
gemini:
api-key: ${GEMINI_API_KEY}
model: gemini-2.5-flash
temperature: 0.7
max-output-tokens: 4096
timeout: 5m
thinking-config:
include-thoughts: true
thinking-budget: 8192
session-history-summary:
provider: gemini-llm
Anthropic (Claude)
Anthropic integration for Claude models.
| Property | Description | Default |
|---|---|---|
type |
Must be anthropic |
— |
anthropic.api-key |
Required. Anthropic API key. | — |
anthropic.model |
Required. Model name (e.g., claude-sonnet-4-20250514, claude-3-5-haiku-20241022). |
— |
anthropic.base-url |
Custom API base URL (for proxies or custom endpoints). | Anthropic default |
anthropic.temperature |
Sampling temperature (0.0–1.0). | Model default |
anthropic.top-p |
Nucleus sampling parameter. | Model default |
anthropic.top-k |
Top-k sampling parameter. | Model default |
anthropic.max-tokens |
Maximum number of output tokens. | Model default |
anthropic.timeout |
Request timeout (Duration). | 5m |
Example:
koldan:
llm:
providers:
anthropic-llm:
type: anthropic
anthropic:
api-key: ${ANTHROPIC_API_KEY}
model: claude-sonnet-4-20250514
temperature: 0.7
max-tokens: 4096
timeout: 5m
session-history-summary:
provider: anthropic-llm
Google Cloud Vertex AI (Gemini)
Google Cloud Vertex AI integration using Gemini models. Unlike the Google AI Gemini provider (which uses an API key), Vertex AI authenticates via Google Cloud credentials and requires a GCP project and location.
| Property | Description | Default |
|---|---|---|
type |
Must be vertex-ai |
— |
vertex-ai.project |
Required. Google Cloud project ID. | — |
vertex-ai.location |
Required. GCP region (e.g., us-central1, europe-west1). |
— |
vertex-ai.model |
Required. Model name (e.g., gemini-2.5-flash, gemini-2.5-pro). |
— |
vertex-ai.api-endpoint |
Custom API endpoint (for regional endpoints or proxies). | GCP default |
vertex-ai.temperature |
Sampling temperature. | Model default |
vertex-ai.top-p |
Nucleus sampling parameter. | Model default |
vertex-ai.top-k |
Top-k sampling parameter. | Model default |
vertex-ai.max-output-tokens |
Maximum number of output tokens. | Model default |
Authentication
Vertex AI uses the Google Cloud Application Default Credentials (ADC) mechanism. Ensure credentials are available via one of the following:
GOOGLE_APPLICATION_CREDENTIALSenvironment variable pointing to a service account JSON key file.- gcloud CLI — run
gcloud auth application-default loginon the host. - GKE Workload Identity or Compute Engine default service account when running on GCP infrastructure.
Example:
koldan:
llm:
providers:
vertex-llm:
type: vertex-ai
vertex-ai:
project: ${GCP_PROJECT_ID}
location: us-central1
model: gemini-2.5-flash
temperature: 0.7
top-k: 40
max-output-tokens: 4096
session-history-summary:
provider: vertex-llm
Task Configuration
Tasks such as session-history-summary and session-history-autotitle are linked to a provider and optionally override the model:
koldan:
llm:
session-history-summary:
provider: my-openai # references a provider id from the providers map
model: gpt-4o-mini # optional: overrides the provider's default model
session-history-autotitle:
provider: local-llm
If model is not specified in the task config, the provider's default model is used.