POST /v1/audio/transcriptions
Transcribe an audio file to text using Whisper Large v3.
Request
POST https://api.chris.hellotopia.io/v1/audio/transcriptions
Authorization: Bearer <api_key>
Content-Type: multipart/form-data
| Form field | Type | Notes |
|---|---|---|
file |
binary | Audio file. mp3, mp4, mpeg, mpga, m4a, wav, webm, aiff, flac, ogg. |
model |
string | whisper-large-v3 |
language |
string | (optional) ISO-639-1 code (e.g. en). Auto-detected if omitted. |
prompt |
string | (optional) Priming text to bias transcription (names, jargon). |
temperature |
number | (optional) 0–1. Default 0. |
response_format |
string | (optional) json (default), text, srt, verbose_json, vtt. |
Example
Response (JSON)
Response (verbose_json)
Returns per-segment timestamps and confidence:
{
"task": "transcribe",
"language": "en",
"duration": 312.48,
"text": "Okay so today...",
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 4.72,
"text": " Okay so today we're going to talk about...",
"tokens": [50364, 1033, 370, ...],
"temperature": 0.0,
"avg_logprob": -0.31,
"compression_ratio": 1.45,
"no_speech_prob": 0.02
}
]
}
Notes
- GPU-accelerated (CUDA) — 1-hour audio typically transcribes in under a minute.
- Single-stream service. Concurrent requests queue serially — plan for it in batch workloads.
- Maximum file size: ~1 GB. Split longer recordings.
- No live/streaming transcription yet — post-hoc only.