POST /v1/audio/transcriptions

Transcribe an audio file to text using Whisper Large v3.

Request

POST https://api.chris.hellotopia.io/v1/audio/transcriptions
Authorization: Bearer <api_key>
Content-Type: multipart/form-data

Form field	Type	Notes
`file`	binary	Audio file. `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, `webm`, `aiff`, `flac`, `ogg`.
`model`	string	`whisper-large-v3`
`language`	string	(optional) ISO-639-1 code (e.g. `en`). Auto-detected if omitted.
`prompt`	string	(optional) Priming text to bias transcription (names, jargon).
`temperature`	number	(optional) 0–1. Default 0.
`response_format`	string	(optional) `json` (default), `text`, `srt`, `verbose_json`, `vtt`.

Example

curlPython

curl https://api.chris.hellotopia.io/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F "file=@meeting.m4a" \
  -F "model=whisper-large-v3" \
  -F "language=en"

from openai import OpenAI
client = OpenAI(base_url="https://api.chris.hellotopia.io/v1", api_key="sk-...")

with open("meeting.m4a", "rb") as f:
    resp = client.audio.transcriptions.create(
        file=f,
        model="whisper-large-v3",
        language="en",
    )
print(resp.text)

Response (JSON)

{"text": "Okay so today we're going to..."}

Response (verbose_json)

Returns per-segment timestamps and confidence:

{
  "task": "transcribe",
  "language": "en",
  "duration": 312.48,
  "text": "Okay so today...",
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 4.72,
      "text": " Okay so today we're going to talk about...",
      "tokens": [50364, 1033, 370, ...],
      "temperature": 0.0,
      "avg_logprob": -0.31,
      "compression_ratio": 1.45,
      "no_speech_prob": 0.02
    }
  ]
}

Notes

GPU-accelerated (CUDA) — 1-hour audio typically transcribes in under a minute.
Single-stream service. Concurrent requests queue serially — plan for it in batch workloads.
Maximum file size: ~1 GB. Split longer recordings.
No live/streaming transcription yet — post-hoc only.