LLM Providers

Koldan integrates with Large Language Models (LLMs) for features such as session history summarization and autotitling. Multiple providers can be configured simultaneously, and each task can reference a specific provider.

Configuration Overview

All LLM settings are under the koldan.llm.* prefix in your koldan server properties file:

koldan:
  llm:
    providers:
      <provider-id>:
        type: <openai | ollama | bedrock | gemini | anthropic | vertex-ai>
        # provider-specific settings below
    session-history-summary:
      provider: <provider-id>
      model: <optional model override>
    session-history-autotitle:
      provider: <provider-id>
      model: <optional model override>

providers — a map of named provider configurations. Each entry has a unique <provider-id> and a type that selects the backend.
Task configs (session-history-summary, session-history-autotitle) reference a provider by its <provider-id> and can optionally override the model.

Supported Providers

OpenAI

Uses the OpenAI Chat API (or any OpenAI-compatible endpoint).

Property	Description	Default
`type`	Must be `openai`	—
`openai.api-key`	Required. OpenAI API key.	—
`openai.model`	Model name (e.g., `gpt-4`, `gpt-4o`, `o3-mini`).	—
`openai.base-url`	Custom API base URL (for proxies or compatible services).	OpenAI default
`openai.temperature`	Sampling temperature (0.0–2.0).	Model default
`openai.timeout`	Request timeout (Duration).	`5m`
`openai.service-tier`	OpenAI service tier.	—
`openai.reasoning-effort`	Reasoning effort for supported models (e.g., `low`, `medium`, `high`).	—

Example:

koldan:
  llm:
    providers:
      my-openai:
        type: openai
        openai:
          api-key: ${OPENAI_API_KEY}
          model: gpt-4o
          temperature: 0.3
          timeout: 2m

Ollama

Connects to a local or remote Ollama instance.

Property	Description	Default
`type`	Must be `ollama`	—
`ollama.base-url`	Ollama server URL.	`http://localhost:11434`
`ollama.model`	Model name (e.g., `llama2`, `gemma3:27b`).	—
`ollama.temperature`	Sampling temperature.	Model default
`ollama.timeout`	Request timeout (Duration).	`5m`

Example:

koldan:
  llm:
    providers:
      local-llm:
        type: ollama
        ollama:
          base-url: http://192.168.0.44:31262
          model: gemma3:27b

Amazon Bedrock

Supports models such as Anthropic Claude, Amazon Titan, Meta Llama, and others available in your AWS region.

Property	Description	Default
`type`	Must be `bedrock`	—
`bedrock.region`	AWS region (e.g., `us-east-1`, `eu-west-1`).	`us-east-1`
`bedrock.model`	Required. Bedrock model ID (e.g., `anthropic.claude-3-sonnet-20240229-v1:0`).	—
`bedrock.access-key-id`	AWS access key ID. If omitted, the default AWS credential chain is used.	—
`bedrock.secret-access-key`	AWS secret access key. If omitted, the default AWS credential chain is used.	—
`bedrock.temperature`	Sampling temperature.	Model default
`bedrock.max-tokens`	Maximum number of output tokens.	Model default
`bedrock.timeout`	Request timeout (Duration).	`5m`

Authentication

You can provide AWS credentials in two ways:

Explicit credentials — set access-key-id and secret-access-key in the config (use environment variable references to avoid hardcoding secrets).
Default AWS credential chain — omit the credential properties and let the AWS SDK resolve credentials automatically (e.g., from environment variables AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY, IAM instance profiles, ECS task roles, or ~/.aws/credentials).

Example:

koldan:
  llm:
    providers:
      bedrock-llm:
        type: bedrock
        bedrock:
          region: us-east-1
          model: anthropic.claude-3-sonnet-20240229-v1:0
          access-key-id: ${AWS_ACCESS_KEY_ID}
          secret-access-key: ${AWS_SECRET_ACCESS_KEY}
          temperature: 0.7
          max-tokens: 4096
          timeout: 5m
    session-history-summary:
      provider: bedrock-llm

Google AI Gemini

Google AI Gemini integration.

Property	Description	Default
`type`	Must be `gemini`	—
`gemini.api-key`	Required. Google AI API key.	—
`gemini.model`	Required. Model name (e.g., `gemini-pro`, `gemini-2.5-flash`).	—
`gemini.base-url`	Custom API base URL (for proxies or custom endpoints).	Google AI default
`gemini.temperature`	Sampling temperature.	Model default
`gemini.max-output-tokens`	Maximum number of output tokens.	Model default
`gemini.timeout`	Request timeout (Duration).	`5m`
`gemini.thinking-config.include-thoughts`	Whether to include thinking/reasoning in the response.	—
`gemini.thinking-config.thinking-budget`	Token budget for the thinking/reasoning phase.	—
`gemini.thinking-config.thinking-level`	Thinking level (e.g., `low`, `medium`, `high`).	—

Example:

koldan:
  llm:
    providers:
      gemini-llm:
        type: gemini
        gemini:
          api-key: ${GEMINI_API_KEY}
          model: gemini-2.5-flash
          temperature: 0.7
          max-output-tokens: 4096
          timeout: 5m
          thinking-config:
            include-thoughts: true
            thinking-budget: 8192
    session-history-summary:
      provider: gemini-llm

Anthropic (Claude)

Anthropic integration for Claude models.

Property	Description	Default
`type`	Must be `anthropic`	—
`anthropic.api-key`	Required. Anthropic API key.	—
`anthropic.model`	Required. Model name (e.g., `claude-sonnet-4-20250514`, `claude-3-5-haiku-20241022`).	—
`anthropic.base-url`	Custom API base URL (for proxies or custom endpoints).	Anthropic default
`anthropic.temperature`	Sampling temperature (0.0–1.0).	Model default
`anthropic.top-p`	Nucleus sampling parameter.	Model default
`anthropic.top-k`	Top-k sampling parameter.	Model default
`anthropic.max-tokens`	Maximum number of output tokens.	Model default
`anthropic.timeout`	Request timeout (Duration).	`5m`

Example:

koldan:
  llm:
    providers:
      anthropic-llm:
        type: anthropic
        anthropic:
          api-key: ${ANTHROPIC_API_KEY}
          model: claude-sonnet-4-20250514
          temperature: 0.7
          max-tokens: 4096
          timeout: 5m
    session-history-summary:
      provider: anthropic-llm

Google Cloud Vertex AI (Gemini)

Google Cloud Vertex AI integration using Gemini models. Unlike the Google AI Gemini provider (which uses an API key), Vertex AI authenticates via Google Cloud credentials and requires a GCP project and location.

Property	Description	Default
`type`	Must be `vertex-ai`	—
`vertex-ai.project`	Required. Google Cloud project ID.	—
`vertex-ai.location`	Required. GCP region (e.g., `us-central1`, `europe-west1`).	—
`vertex-ai.model`	Required. Model name (e.g., `gemini-2.5-flash`, `gemini-2.5-pro`).	—
`vertex-ai.api-endpoint`	Custom API endpoint (for regional endpoints or proxies).	GCP default
`vertex-ai.temperature`	Sampling temperature.	Model default
`vertex-ai.top-p`	Nucleus sampling parameter.	Model default
`vertex-ai.top-k`	Top-k sampling parameter.	Model default
`vertex-ai.max-output-tokens`	Maximum number of output tokens.	Model default

Authentication

Vertex AI uses the Google Cloud Application Default Credentials (ADC) mechanism. Ensure credentials are available via one of the following:

GOOGLE_APPLICATION_CREDENTIALS environment variable pointing to a service account JSON key file.
gcloud CLI — run gcloud auth application-default login on the host.
GKE Workload Identity or Compute Engine default service account when running on GCP infrastructure.

Example:

koldan:
  llm:
    providers:
      vertex-llm:
        type: vertex-ai
        vertex-ai:
          project: ${GCP_PROJECT_ID}
          location: us-central1
          model: gemini-2.5-flash
          temperature: 0.7
          top-k: 40
          max-output-tokens: 4096
    session-history-summary:
      provider: vertex-llm

Task Configuration

Tasks such as session-history-summary and session-history-autotitle are linked to a provider and optionally override the model:

koldan:
  llm:
    session-history-summary:
      provider: my-openai     # references a provider id from the providers map
      model: gpt-4o-mini      # optional: overrides the provider's default model
    session-history-autotitle:
      provider: local-llm

If model is not specified in the task config, the provider's default model is used.