Skip to content

Speech Models

Speech models are the core of Koldan's transcription engine. Each model is trained for specific languages, use cases, and performance characteristics. When you create a transcription, you select a model - and Koldan handles the rest.


How Models Work

Every model you see in the API represents a stable identifier that resolves to a specific speech recognition engine version on the server. This means:

  • You always reference models by name - e.g., general-v3, medical-en, telephony.
  • The server resolves your model name to the best available engine version behind the scenes.
  • Model updates are transparent - when a new engine version is deployed, your existing model name automatically points to the improved version. No code changes needed.
flowchart LR
    A["Your API Request\n<code>model: general-v3</code>"] --> B["Koldan Server"]
    B --> C["Resolved Engine\nVersion"]
    C --> D["Transcription Result"]

Model Properties

Each model exposes the following information:

Property Description
Name The stable identifier you use in API requests (e.g., general-v3)
Display Name A human-readable label (e.g., "General v3")
Description What the model is designed for
Status Current availability - see Model Status below
Current Version The engine version this model currently resolves to
Capabilities Supported languages, streaming, and auto-detection - see Capabilities

Model Status

Status Meaning
AVAILABLE The model is ready for use
UNAVAILABLE The model is not currently deployed on this server
MAINTENANCE Temporarily offline for updates - try again later
DEPRECATED Still functional but scheduled for removal - migrate to the suggested replacement

Deprecated Models

When a model is deprecated, the API response includes a deprecationDate and a deprecationMessage with migration guidance. Deprecated models continue to work until their sunset date, after which requests return 410 Gone. Plan your migration early.


Capabilities

Each model declares what it can do. Check capabilities before using a model to ensure it fits your use case.

Capability Description
Languages List of supported BCP-47 language codes (e.g., en, he, de, ar)
Auto-detect Whether the model can automatically identify the spoken language
Streaming Whether the model supports real-time streaming transcription

Check Languages Before Transcribing

If you specify a language that the model doesn't support, the transcription will fail. Use the model languages endpoint to verify supported languages, or enable auto-detection if the model supports it.


Model Types

Models are organized into three categories that determine how they resolve to engine versions:

Type Behavior Example
Family Always resolves to the latest version in the model family. Automatically upgrades when new versions are deployed. general → currently resolves to general-v3-20240915
Pinned Points to a specific major version but may receive minor updates (patches, accuracy improvements). general-v3 → currently resolves to general-v3-20240915
Concrete Locked to an exact engine version. Never changes. Use when you need deterministic, reproducible results. general-v3-20240915 → always this exact version

Which Type Should I Use?

  • Use Family models for most applications - you'll always get the best available version.
  • Use Pinned models when you want a stable major version but still benefit from patches.
  • Use Concrete models only when reproducibility is critical (e.g., compliance, benchmarking).

Default Model

Each Koldan deployment has a default model. If you create a transcription without specifying a model, the default is used automatically.

To find out which model is the default, call the list models endpoint and look for the model marked as default.


Role-Based Model Access

Not all models are available to all users. Administrators can restrict which models each role can access. When you call the list models endpoint, you only see models assigned to your role.

If you need access to a model that isn't listed, contact your administrator.

→ See Roles and Permissions for more on how roles work.


Checking Available Models

Use the Speech Models API to discover what's available to you:

What You Need Endpoint
List all models you can access GET /api/v1/speech-services/models
Get details for a specific model GET /api/v1/speech-services/models/{model}
Check supported languages for a model GET /api/v1/speech-services/models/{model}/languages

→ Full API details in the REST API Reference.