Transcriptions
Manage speech-to-text transcription jobs. Submit audio files for transcription, monitor job progress, retrieve results in multiple formats, configure webhooks for completion notifications, and manage the job lifecycle.
Base path: /api/v1/speech-services/transcriptions
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/speech-services/transcriptions |
List Transcription Jobs |
POST |
/api/v1/speech-services/transcriptions/upload |
Upload and Transcribe |
POST |
/api/v1/speech-services/transcriptions |
Create Transcription Job |
GET |
/api/v1/speech-services/transcriptions/{id} |
Get Transcription Job |
GET |
/api/v1/speech-services/transcriptions/{id}/result |
Get Transcription Result |
PUT |
/api/v1/speech-services/transcriptions/{id}/webhook |
Update Webhook |
DELETE |
/api/v1/speech-services/transcriptions/{id}/webhook |
Delete Webhook |
POST |
/api/v1/speech-services/transcriptions/{id}/cancel |
Cancel Job |
DELETE |
/api/v1/speech-services/transcriptions/{id} |
Delete Job |
POST |
/api/v1/speech-services/transcriptions/{id}/purge |
Purge Job |
List Transcription Jobs
GET /api/v1/speech-services/transcriptions
Requires Authentication - Scopes: speech:transcriptions:read
Retrieve a paginated list of transcription jobs. By default returns only the current user's jobs. When fileId is specified, returns all transcriptions for that file (requires speech:files:read access to the file, enabling discovery of transcriptions on shared files).
Note
The result field is not included in list responses to reduce payload size. Use Get Transcription Job or Get Transcription Result to retrieve results.
Query Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
page |
integer |
No | 0 |
Page number (zero-based). |
size |
integer |
No | 20 |
Page size (max 100). |
status |
string[] |
No | - | Filter by job status (comma-separated). Values: PENDING, IN_PROGRESS, COMPLETED, FAILED, CANCELLED. |
fileId |
string (UUID) |
No | - | Filter by source file ID. When specified for a shared file, returns all transcriptions for that file. |
isDeleted |
boolean |
No | false |
Filter by deleted state. |
isPurged |
boolean |
No | - | Filter by purged state. |
createdAfter |
string (ISO 8601) |
No | - | Return jobs created on or after this timestamp. |
createdBefore |
string (ISO 8601) |
No | - | Return jobs created on or before this timestamp. |
completedAfter |
string (ISO 8601) |
No | - | Return jobs completed on or after this timestamp. |
completedBefore |
string (ISO 8601) |
No | - | Return jobs completed on or before this timestamp. |
sort |
string |
No | creationDate,desc |
Sort field and direction. Allowed fields: creationDate, completionDate, status. Direction: asc or desc. |
SpeechServiceTranscriptionJobsListResponse
| Field | Type | Nullable | Description |
|---|---|---|---|
jobs |
SpeechServiceTranscriptionJobResponse[] |
No | Array of transcription job objects (without result). |
size |
integer |
No | Number of items returned in this page. |
total |
long |
No | Total number of matching items across all pages. |
filters |
object |
No | Echo of applied filters. |
sort |
object |
No | Applied sort order (field and direction). |
{
"jobs": [
{ "id": "d1e2f3a4-...", "status": "COMPLETED", ... }
],
"size": 1,
"total": 1
}
| Status | Description |
|---|---|
200 OK |
Transcription jobs retrieved successfully. |
400 Bad Request |
Invalid sort field or query parameters. |
401 Unauthorized |
Missing or invalid authentication. |
403 Forbidden |
Insufficient scope. |
Upload and Transcribe
POST /api/v1/speech-services/transcriptions/upload
Requires Authentication - Scopes: speech:files:write, speech:transcriptions:write
Rate Limited - This endpoint enforces stricter rate limits
Upload a media file (or import from URI) and immediately start a transcription job in one step. Provide either file (multipart binary) or uri (string), but not both. Optionally provide summary to schedule a summary job that runs after transcription completes.
Content Type
This endpoint requires multipart/form-data content type. The name and transcription parts are required. Provide exactly one of file or uri.
Multipart Form Parts
| Part | Type | Required | Description |
|---|---|---|---|
file |
binary |
Conditional | The media file to upload. Required if uri is not provided. |
uri |
string |
Conditional | URI of a remote file to import (HTTP, HTTPS, or S3). Required if file is not provided. |
name |
string |
Yes | A display name for the uploaded file. |
description |
string |
No | A description of the file content. |
path |
string |
No | Virtual directory path. Must start and end with /. Default: /. |
metadata |
string (JSON) |
No | Additional key-value metadata as a JSON string. |
tags |
string (JSON) |
No | Array of tag display names as a JSON string. |
generateListeningAudio |
boolean |
No | Generate a compressed MP3 derivative. Default: true. |
transcription |
string (JSON) |
Yes | Transcription options as a JSON string. See SpeechServiceTranscriptionOptions. |
diarization |
string (JSON) |
No | Diarization options as a JSON string. See SpeechServiceDiarizationOptions. |
webhook |
string (JSON) |
No | Webhook configuration as a JSON string. See SpeechServiceWebhook. |
summary |
string (JSON) |
No | Requires scope: speech:summaries:writeSummary options as a JSON string. When provided, a summary job is scheduled to run after transcription completes. Requires promptTemplateId. |
curl -X POST https://koldan.dixilang.com/api/v1/speech-services/transcriptions/upload \
-H "X-API-Key: $KOLDAN_API_KEY" \
-F "file=@meeting-2026-04-01.wav" \
-F "name=meeting-recording" \
-F 'transcription={"model":"hebrew-general","language":{"defaultLanguageCode":"he"}}' \
-F 'diarization={"enabled":true,"mode":"SPEAKER","maxSpeakers":4}'
import requests, json
with open("meeting-2026-04-01.wav", "rb") as f:
resp = requests.post(
"https://koldan.dixilang.com/api/v1/speech-services/transcriptions/upload",
headers={"Authorization": f"Bearer {JWT}"},
files={"file": ("meeting-2026-04-01.wav", f)},
data={
"name": "meeting-recording",
"transcription": json.dumps({"model": "hebrew-general", "language": {"defaultLanguageCode": "he"}}),
"diarization": json.dumps({"enabled": True, "mode": "SPEAKER", "maxSpeakers": 4})
}
)
print(resp.json())
SpeechServiceUploadAndTranscribeResponse
| Field | Type | Nullable | Description |
|---|---|---|---|
file |
SpeechServiceFileResponse |
No | The uploaded file object. |
job |
SpeechServiceTranscriptionJobResponse |
No | The created transcription job. |
summary |
object |
Yes | The scheduled summary job. Present only when summary options were provided. |
{
"file": { "id": "a1b2c3d4-...", "name": "meeting-recording", ... },
"job": { "id": "d1e2f3a4-...", "status": "PENDING", ... },
"summary": null
}
| Status | Description |
|---|---|
201 Created |
File uploaded and transcription job created successfully. |
400 Bad Request |
Invalid JSON parts, missing required parts, or both file and uri provided (or neither). |
401 Unauthorized |
Missing or invalid authentication. |
403 Forbidden |
Insufficient scope. |
429 Too Many Requests |
Rate limit exceeded. |
Create Transcription Job
POST /api/v1/speech-services/transcriptions
Requires Authentication - Scopes: speech:transcriptions:write
Rate Limited - This endpoint enforces stricter rate limits
Submit a new transcription job for an existing uploaded file. The file must belong to the current user, must not be deleted, and must have completed ingestion (for URI imports).
SpeechServiceTranscriptionJobRequest
| Field | Type | Required | Description |
|---|---|---|---|
fileId |
string (UUID) |
Yes | The ID of the uploaded file to process. |
transcription |
SpeechServiceTranscriptionOptions |
Yes | Transcription configuration options. |
diarization |
SpeechServiceDiarizationOptions |
No | Speaker diarization configuration. |
webhook |
SpeechServiceWebhook |
No | Webhook configuration for completion notifications. |
{
"fileId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"transcription": {
"model": "hebrew-general",
"language": { "defaultLanguageCode": "he" },
"punctuation": true
},
"diarization": { "enabled": true, "mode": "SPEAKER" }
}
curl -X POST https://koldan.dixilang.com/api/v1/speech-services/transcriptions \
-H "X-API-Key: $KOLDAN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"fileId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"transcription": {
"model": "hebrew-general",
"language": { "defaultLanguageCode": "he" },
"punctuation": true
},
"diarization": { "enabled": true }
}'
import requests
resp = requests.post(
"https://koldan.dixilang.com/api/v1/speech-services/transcriptions",
headers={
"Authorization": f"Bearer {JWT}",
"Content-Type": "application/json"
},
json={
"fileId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"transcription": {
"model": "hebrew-general",
"language": {"defaultLanguageCode": "he"},
"punctuation": True
},
"diarization": {"enabled": True}
}
)
print(resp.json())
SpeechServiceTranscriptionJobResponse
Returns the created transcription job with status set to PENDING. See Data Types for the full response schema.
| Status | Description |
|---|---|
201 Created |
Transcription job created successfully. |
400 Bad Request |
Invalid request body or missing required fields. |
401 Unauthorized |
Missing or invalid authentication. |
403 Forbidden |
Insufficient scope. |
404 Not Found |
File not found or does not belong to the current user. |
422 Unprocessable Entity |
File is deleted or ingestion is not completed. |
Get Transcription Job
GET /api/v1/speech-services/transcriptions/{id}
Requires Authentication - Scopes: speech:transcriptions:read
Retrieve details of a specific transcription job including the full result when the job is completed and not purged.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
string (UUID) |
Yes | Unique identifier of the transcription job. |
SpeechServiceTranscriptionJobResponse
Returns the full job object including the result field when status is COMPLETED and isPurged is false. See Data Types for the full response schema.
{
"id": "d1e2f3a4-b5c6-7890-abcd-ef1234567890",
"status": "COMPLETED",
"creationDate": "2026-04-01T10:00:00Z",
"completionDate": "2026-04-01T10:05:32Z",
"result": { "text": "Hello everyone...", "confidence": 0.94, ... },
...
}
| Status | Description |
|---|---|
200 OK |
Transcription job retrieved successfully. |
401 Unauthorized |
Missing or invalid authentication. |
403 Forbidden |
Insufficient scope or not authorized to access this job. |
404 Not Found |
Job not found. |
Get Transcription Result
GET /api/v1/speech-services/transcriptions/{id}/result
Requires Authentication - Scopes: speech:transcriptions:read
Retrieve the transcription result for a completed job. The format parameter controls the export format.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
string (UUID) |
Yes | Unique identifier of the transcription job. |
Query Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
format |
string |
No | json |
Export format. Currently supported: json. |
SpeechServiceTranscriptionJobResult
Returns the transcription result object. See Data Types for the full response schema.
{
"text": "Hello everyone, welcome to the meeting...",
"detectedLanguages": ["en"],
"confidence": 0.94,
"duration": 420.5,
"segments": [
{ "start": 0.0, "end": 3.2, "text": "Hello everyone,", "speaker": "SPEAKER_00", ... }
]
}
| Status | Description |
|---|---|
200 OK |
Transcription result returned successfully. |
400 Bad Request |
Unsupported export format. |
401 Unauthorized |
Missing or invalid authentication. |
403 Forbidden |
Insufficient scope or not authorized to access this job. |
404 Not Found |
Job not found or result not available (job not completed). |
410 Gone |
Job result has been purged and is no longer available. |
Update Webhook
PUT /api/v1/speech-services/transcriptions/{id}/webhook
Requires Authentication - Scopes: speech:transcriptions:write
Set or update the webhook configuration for an existing transcription job. The job must not be in a terminal state (COMPLETED, FAILED, or CANCELLED).
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
string (UUID) |
Yes | Unique identifier of the transcription job. |
SpeechServiceWebhook
| Field | Type | Required | Description |
|---|---|---|---|
url |
string |
Yes | The URL to send the webhook notification to. Must be HTTPS. |
secret |
string |
No | A shared secret used to sign webhook payloads with HMAC-SHA256. |
headers |
object |
No | Custom HTTP headers to include in the webhook request. |
{
"url": "https://example.com/webhook/transcriptions",
"secret": "whsec_abc123",
"headers": { "X-Custom-Header": "my-value" }
}
import requests
job_id = "d1e2f3a4-b5c6-7890-abcd-ef1234567890"
resp = requests.put(
f"https://koldan.dixilang.com/api/v1/speech-services/transcriptions/{job_id}/webhook",
headers={
"Authorization": f"Bearer {JWT}",
"Content-Type": "application/json"
},
json={
"url": "https://example.com/webhook/transcriptions",
"secret": "whsec_abc123"
}
)
print(resp.json())
SpeechServiceTranscriptionJobResponse
Returns the updated job object. See Data Types for the full response schema.
| Status | Description |
|---|---|
200 OK |
Webhook updated successfully. |
401 Unauthorized |
Missing or invalid authentication. |
403 Forbidden |
Insufficient scope or not authorized. |
404 Not Found |
Job not found. |
409 Conflict |
Job is in a terminal state and cannot be updated. |
Delete Webhook
DELETE /api/v1/speech-services/transcriptions/{id}/webhook
Requires Authentication - Scopes: speech:transcriptions:write
Remove the webhook configuration from an existing transcription job. The job must not be in a terminal state (COMPLETED, FAILED, or CANCELLED).
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
string (UUID) |
Yes | Unique identifier of the transcription job. |
Response
No response body.
| Status | Description |
|---|---|
204 No Content |
Webhook removed successfully. |
401 Unauthorized |
Missing or invalid authentication. |
403 Forbidden |
Insufficient scope or not authorized. |
404 Not Found |
Job not found. |
409 Conflict |
Job is in a terminal state and cannot be updated. |
Cancel Job
POST /api/v1/speech-services/transcriptions/{id}/cancel
Requires Authentication - Scopes: speech:transcriptions:write
Cancel a pending or in-progress transcription job. Only jobs with PENDING or IN_PROGRESS status can be cancelled.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
string (UUID) |
Yes | Unique identifier of the transcription job. |
Response
No response body.
| Status | Description |
|---|---|
204 No Content |
Job cancelled successfully. |
401 Unauthorized |
Missing or invalid authentication. |
403 Forbidden |
Insufficient scope or not authorized. |
404 Not Found |
Job not found. |
409 Conflict |
Job is already in a terminal state (COMPLETED, FAILED, or CANCELLED). |
Delete Job
DELETE /api/v1/speech-services/transcriptions/{id}
Requires Authentication - Scopes: speech:transcriptions:delete
Delete a transcription job. Only jobs in a terminal state (COMPLETED, FAILED, or CANCELLED) can be deleted. Use ?purge=true for immediate permanent removal of result and error data.
Purge is Irreversible
When purge=true, the transcription result and error data are permanently deleted from storage and cannot be recovered.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
string (UUID) |
Yes | Unique identifier of the transcription job. |
Query Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
purge |
boolean |
No | false |
Immediately purge result and error data from storage. |
Response
No response body.
| Status | Description |
|---|---|
204 No Content |
Job deleted (or purged) successfully. |
401 Unauthorized |
Missing or invalid authentication. |
403 Forbidden |
Insufficient scope or not authorized. |
404 Not Found |
Job not found. |
409 Conflict |
Job is not in a terminal state, or is already deleted (when purge=false). |
Purge Job
POST /api/v1/speech-services/transcriptions/{id}/purge
Requires Authentication - Scopes: speech:transcriptions:delete
Permanently purge result and error data of a deleted transcription job. The job must already be deleted and not yet purged. This operation is irreversible.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
string (UUID) |
Yes | Unique identifier of the transcription job. |
Response
No response body.
| Status | Description |
|---|---|
204 No Content |
Job data purged successfully. |
401 Unauthorized |
Missing or invalid authentication. |
403 Forbidden |
Insufficient scope or not authorized. |
404 Not Found |
Job not found. |
409 Conflict |
Job is not deleted, or is already purged. |
Data Types
SpeechServiceTranscriptionJobResponse
| Field | Type | Nullable | Description |
|---|---|---|---|
id |
string (UUID) |
No | Unique identifier of the transcription job. |
status |
string |
No | Current job status: PENDING, IN_PROGRESS, COMPLETED, FAILED, CANCELLED. |
isDeleted |
boolean |
No | Whether the job has been deleted. |
isPurged |
boolean |
No | Whether the job's result/error data has been permanently removed. |
creationDate |
string (ISO 8601) |
No | Timestamp when the job was created. |
completionDate |
string (ISO 8601) |
Yes | Timestamp when the job completed. null if not yet completed. |
deletedAt |
string (ISO 8601) |
Yes | Timestamp when the job was deleted. |
purgedAt |
string (ISO 8601) |
Yes | Timestamp when the job data was purged. |
purgeAt |
string (ISO 8601) |
Yes | Scheduled timestamp for automatic purge. |
file |
SpeechServiceFileResponse |
No | The media file associated with this job. |
transcription |
SpeechServiceTranscriptionOptions |
No | Transcription configuration used for this job. |
diarization |
SpeechServiceDiarizationOptions |
Yes | Diarization configuration used for this job. |
result |
SpeechServiceTranscriptionJobResult |
Yes | Transcription result. Present only when status is COMPLETED and isPurged is false. Not included in list responses. |
errorCode |
string |
Yes | Top-level error code when the job has failed. See TranscriptionErrorCode. |
errors |
SpeechServiceTranscriptionJobError[] |
Yes | Array of errors encountered during processing. Cleared when purged. |
webhook |
SpeechServiceWebhook |
Yes | Webhook configuration for job completion notifications. |
SpeechServiceTranscriptionJobResult
| Field | Type | Nullable | Description |
|---|---|---|---|
text |
string |
No | The full transcription text. |
detectedLanguages |
string[] |
Yes | Set of IETF BCP 47 codes of the detected languages. |
confidence |
double |
Yes | Overall confidence score of the transcription (0.0–1.0). |
duration |
double |
Yes | Total audio duration in seconds. |
segments |
SpeechServiceTranscriptionSegment[] |
Yes | Array of transcription segments with timestamps. |
SpeechServiceTranscriptionSegment
| Field | Type | Nullable | Description |
|---|---|---|---|
start |
double |
No | Start time of the segment in seconds. |
end |
double |
No | End time of the segment in seconds. |
text |
string |
No | The transcribed text for this segment. |
confidence |
double |
Yes | Confidence score for this segment (0.0–1.0). |
language |
string |
Yes | IETF BCP 47 language code detected for this segment. |
speaker |
string |
Yes | Speaker label (SPEAKER_00, SPEAKER_01, …) or custom channel label. |
channel |
integer |
Yes | Zero-based audio channel index. Present only when diarization mode is CHANNEL. |
words |
SpeechServiceWordTiming[] |
Yes | Word-level timing information. |
SpeechServiceWordTiming
| Field | Type | Nullable | Description |
|---|---|---|---|
word |
string |
No | The transcribed word. |
start |
double |
No | Start time of the word in seconds. |
end |
double |
No | End time of the word in seconds. |
confidence |
double |
Yes | Confidence score for this word (0.0–1.0). |
punctuation |
string |
Yes | Punctuation mark following the word (e.g., ., ,, ?). |
capitalization |
string |
Yes | Capitalized form of the word as determined by the capitalization service. |
SpeechServiceTranscriptionOptions
| Field | Type | Required | Description |
|---|---|---|---|
model |
string |
Yes | The model identifier to use (e.g., hebrew-general). See Models for available models. |
language |
SpeechServiceTranscriptionLanguageOptions |
Yes | Language configuration for transcription. |
punctuation |
boolean |
No | Enable automatic punctuation insertion in the transcribed text. |
capitalization |
boolean |
No | Enable automatic capitalization of the transcribed text. May be ignored depending on the language. |
SpeechServiceTranscriptionLanguageOptions
| Field | Type | Required | Description |
|---|---|---|---|
defaultLanguageCode |
string |
No | IETF BCP 47 language code (e.g., en, he, de). When omitted, the model will attempt automatic language detection. |
SpeechServiceDiarizationOptions
| Field | Type | Required | Description |
|---|---|---|---|
enabled |
boolean |
Yes | Whether diarization should be performed. |
mode |
string |
No | Diarization mode: SPEAKER (default) or CHANNEL. See SpeechServiceDiarizationMode. |
maxSpeakers |
integer |
No | Maximum number of speakers expected. Used only when mode is SPEAKER. |
channelMapping |
SpeechServiceChannelMapping[] |
No | Mapping of channel indices to custom speaker labels. Used only when mode is CHANNEL. |
SpeechServiceChannelMapping
| Field | Type | Required | Description |
|---|---|---|---|
channel |
integer |
Yes | Zero-based audio channel index. |
label |
string |
Yes | Custom speaker label for this channel (e.g., Agent, Customer). |
SpeechServiceWebhook
| Field | Type | Required | Description |
|---|---|---|---|
url |
string |
Yes | The URL to send the webhook notification to. Must be HTTPS. |
secret |
string |
No | A shared secret used to sign webhook payloads with HMAC-SHA256. |
headers |
object |
No | Custom HTTP headers to include in the webhook request. |
SpeechServiceTranscriptionJobError
| Field | Type | Nullable | Description |
|---|---|---|---|
code |
string |
No | Machine-readable error code. See TranscriptionErrorCode. |
message |
string |
No | Human-readable error message describing what went wrong. |
timestamp |
string (ISO 8601) |
No | Timestamp when the error occurred. |
Enumerations
TranscriptionJobStatus
| Value | Description |
|---|---|
PENDING |
Job has been accepted but processing has not started. |
IN_PROGRESS |
Audio is being transcribed. |
COMPLETED |
Transcription finished successfully. Result is available. |
FAILED |
Transcription failed. Check errorCode and errors for details. |
CANCELLED |
Job was cancelled before completion. |
SpeechServiceDiarizationMode
| Value | Description |
|---|---|
SPEAKER |
Automatic speaker separation (default). Speakers are identified by a diarization model. |
CHANNEL |
Channel-based speaker separation. Each audio channel maps to a distinct speaker. |
TranscriptionErrorCode
| Value | Description |
|---|---|
TRANSCRIPTION_FAILED |
The transcription engine encountered an error while processing the audio. |
DIARIZATION_FAILED |
Speaker diarization failed during processing. |
CHANNEL_DIARIZATION_FAILED |
Channel-based diarization failed (e.g., channel extraction error). |
UNSUPPORTED_FORMAT |
The file format or audio codec is not supported for transcription. |
FILE_CORRUPTED |
The file could not be decoded or is corrupted. |
FILE_NOT_FOUND |
The referenced file no longer exists or has been purged. |
EMPTY_AUDIO |
The file contains no audio stream or has zero duration. |
DURATION_EXCEEDED |
The file exceeds the maximum allowed duration for transcription. |
QUOTA_EXCEEDED |
The user's monthly transcription minutes quota has been exceeded. |
LANGUAGE_NOT_SUPPORTED |
The specified or detected language is not supported by the speech model. |
MODEL_UNAVAILABLE |
The requested speech model is offline or unreachable. |
PUNCTUATION_FAILED |
Punctuation and capitalization post-processing failed. |
INTERNAL_ERROR |
An unexpected internal error occurred during processing. |
Job Lifecycle
Transcription jobs follow a lifecycle: pending → in progress → completed/failed/cancelled → deleted → purged. Only terminal-state jobs (COMPLETED, FAILED, CANCELLED) can be deleted. Purging permanently removes the transcription result and error data from storage.
Upload and Transcribe
For the simplest integration, use the Upload and Transcribe endpoint to upload a file and start transcription in a single request. This avoids the two-step process of uploading a file first and then creating a transcription job separately.