Skip to content

LLM Providers

Koldan integrates with Large Language Models (LLMs) for features such as session history summarization and autotitling. Multiple providers can be configured simultaneously, and each task can reference a specific provider.

Configuration Overview

All LLM settings are under the koldan.llm.* prefix in your koldan server properties file:

koldan:
  llm:
    providers:
      <provider-id>:
        type: <openai | ollama | bedrock | gemini | anthropic | vertex-ai>
        # provider-specific settings below
    session-history-summary:
      provider: <provider-id>
      model: <optional model override>
    session-history-autotitle:
      provider: <provider-id>
      model: <optional model override>
  • providers — a map of named provider configurations. Each entry has a unique <provider-id> and a type that selects the backend.
  • Task configs (session-history-summary, session-history-autotitle) reference a provider by its <provider-id> and can optionally override the model.

Supported Providers

OpenAI

Uses the OpenAI Chat API (or any OpenAI-compatible endpoint).

Property Description Default
type Must be openai
openai.api-key Required. OpenAI API key.
openai.model Model name (e.g., gpt-4, gpt-4o, o3-mini).
openai.base-url Custom API base URL (for proxies or compatible services). OpenAI default
openai.temperature Sampling temperature (0.0–2.0). Model default
openai.timeout Request timeout (Duration). 5m
openai.service-tier OpenAI service tier.
openai.reasoning-effort Reasoning effort for supported models (e.g., low, medium, high).

Example:

koldan:
  llm:
    providers:
      my-openai:
        type: openai
        openai:
          api-key: ${OPENAI_API_KEY}
          model: gpt-4o
          temperature: 0.3
          timeout: 2m

Ollama

Connects to a local or remote Ollama instance.

Property Description Default
type Must be ollama
ollama.base-url Ollama server URL. http://localhost:11434
ollama.model Model name (e.g., llama2, gemma3:27b).
ollama.temperature Sampling temperature. Model default
ollama.timeout Request timeout (Duration). 5m

Example:

koldan:
  llm:
    providers:
      local-llm:
        type: ollama
        ollama:
          base-url: http://192.168.0.44:31262
          model: gemma3:27b

Amazon Bedrock

Supports models such as Anthropic Claude, Amazon Titan, Meta Llama, and others available in your AWS region.

Property Description Default
type Must be bedrock
bedrock.region AWS region (e.g., us-east-1, eu-west-1). us-east-1
bedrock.model Required. Bedrock model ID (e.g., anthropic.claude-3-sonnet-20240229-v1:0).
bedrock.access-key-id AWS access key ID. If omitted, the default AWS credential chain is used.
bedrock.secret-access-key AWS secret access key. If omitted, the default AWS credential chain is used.
bedrock.temperature Sampling temperature. Model default
bedrock.max-tokens Maximum number of output tokens. Model default
bedrock.timeout Request timeout (Duration). 5m

Authentication

You can provide AWS credentials in two ways:

  1. Explicit credentials — set access-key-id and secret-access-key in the config (use environment variable references to avoid hardcoding secrets).
  2. Default AWS credential chain — omit the credential properties and let the AWS SDK resolve credentials automatically (e.g., from environment variables AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY, IAM instance profiles, ECS task roles, or ~/.aws/credentials).

Example:

koldan:
  llm:
    providers:
      bedrock-llm:
        type: bedrock
        bedrock:
          region: us-east-1
          model: anthropic.claude-3-sonnet-20240229-v1:0
          access-key-id: ${AWS_ACCESS_KEY_ID}
          secret-access-key: ${AWS_SECRET_ACCESS_KEY}
          temperature: 0.7
          max-tokens: 4096
          timeout: 5m
    session-history-summary:
      provider: bedrock-llm

Google AI Gemini

Google AI Gemini integration.

Property Description Default
type Must be gemini
gemini.api-key Required. Google AI API key.
gemini.model Required. Model name (e.g., gemini-pro, gemini-2.5-flash).
gemini.base-url Custom API base URL (for proxies or custom endpoints). Google AI default
gemini.temperature Sampling temperature. Model default
gemini.max-output-tokens Maximum number of output tokens. Model default
gemini.timeout Request timeout (Duration). 5m
gemini.thinking-config.include-thoughts Whether to include thinking/reasoning in the response.
gemini.thinking-config.thinking-budget Token budget for the thinking/reasoning phase.
gemini.thinking-config.thinking-level Thinking level (e.g., low, medium, high).

Example:

koldan:
  llm:
    providers:
      gemini-llm:
        type: gemini
        gemini:
          api-key: ${GEMINI_API_KEY}
          model: gemini-2.5-flash
          temperature: 0.7
          max-output-tokens: 4096
          timeout: 5m
          thinking-config:
            include-thoughts: true
            thinking-budget: 8192
    session-history-summary:
      provider: gemini-llm

Anthropic (Claude)

Anthropic integration for Claude models.

Property Description Default
type Must be anthropic
anthropic.api-key Required. Anthropic API key.
anthropic.model Required. Model name (e.g., claude-sonnet-4-20250514, claude-3-5-haiku-20241022).
anthropic.base-url Custom API base URL (for proxies or custom endpoints). Anthropic default
anthropic.temperature Sampling temperature (0.0–1.0). Model default
anthropic.top-p Nucleus sampling parameter. Model default
anthropic.top-k Top-k sampling parameter. Model default
anthropic.max-tokens Maximum number of output tokens. Model default
anthropic.timeout Request timeout (Duration). 5m

Example:

koldan:
  llm:
    providers:
      anthropic-llm:
        type: anthropic
        anthropic:
          api-key: ${ANTHROPIC_API_KEY}
          model: claude-sonnet-4-20250514
          temperature: 0.7
          max-tokens: 4096
          timeout: 5m
    session-history-summary:
      provider: anthropic-llm

Google Cloud Vertex AI (Gemini)

Google Cloud Vertex AI integration using Gemini models. Unlike the Google AI Gemini provider (which uses an API key), Vertex AI authenticates via Google Cloud credentials and requires a GCP project and location.

Property Description Default
type Must be vertex-ai
vertex-ai.project Required. Google Cloud project ID.
vertex-ai.location Required. GCP region (e.g., us-central1, europe-west1).
vertex-ai.model Required. Model name (e.g., gemini-2.5-flash, gemini-2.5-pro).
vertex-ai.api-endpoint Custom API endpoint (for regional endpoints or proxies). GCP default
vertex-ai.temperature Sampling temperature. Model default
vertex-ai.top-p Nucleus sampling parameter. Model default
vertex-ai.top-k Top-k sampling parameter. Model default
vertex-ai.max-output-tokens Maximum number of output tokens. Model default

Authentication

Vertex AI uses the Google Cloud Application Default Credentials (ADC) mechanism. Ensure credentials are available via one of the following:

  1. GOOGLE_APPLICATION_CREDENTIALS environment variable pointing to a service account JSON key file.
  2. gcloud CLI — run gcloud auth application-default login on the host.
  3. GKE Workload Identity or Compute Engine default service account when running on GCP infrastructure.

Example:

koldan:
  llm:
    providers:
      vertex-llm:
        type: vertex-ai
        vertex-ai:
          project: ${GCP_PROJECT_ID}
          location: us-central1
          model: gemini-2.5-flash
          temperature: 0.7
          top-k: 40
          max-output-tokens: 4096
    session-history-summary:
      provider: vertex-llm

Task Configuration

Tasks such as session-history-summary and session-history-autotitle are linked to a provider and optionally override the model:

koldan:
  llm:
    session-history-summary:
      provider: my-openai     # references a provider id from the providers map
      model: gpt-4o-mini      # optional: overrides the provider's default model
    session-history-autotitle:
      provider: local-llm

If model is not specified in the task config, the provider's default model is used.