Skip to content

POST /v1/audio/transcriptions

Transcribe an audio file to text using Whisper Large v3.

Request

POST https://api.chris.hellotopia.io/v1/audio/transcriptions
Authorization: Bearer <api_key>
Content-Type: multipart/form-data
Form field Type Notes
file binary Audio file. mp3, mp4, mpeg, mpga, m4a, wav, webm, aiff, flac, ogg.
model string whisper-large-v3
language string (optional) ISO-639-1 code (e.g. en). Auto-detected if omitted.
prompt string (optional) Priming text to bias transcription (names, jargon).
temperature number (optional) 0–1. Default 0.
response_format string (optional) json (default), text, srt, verbose_json, vtt.

Example

curl https://api.chris.hellotopia.io/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F "file=@meeting.m4a" \
  -F "model=whisper-large-v3" \
  -F "language=en"
from openai import OpenAI
client = OpenAI(base_url="https://api.chris.hellotopia.io/v1", api_key="sk-...")

with open("meeting.m4a", "rb") as f:
    resp = client.audio.transcriptions.create(
        file=f,
        model="whisper-large-v3",
        language="en",
    )
print(resp.text)

Response (JSON)

{"text": "Okay so today we're going to..."}

Response (verbose_json)

Returns per-segment timestamps and confidence:

{
  "task": "transcribe",
  "language": "en",
  "duration": 312.48,
  "text": "Okay so today...",
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 4.72,
      "text": " Okay so today we're going to talk about...",
      "tokens": [50364, 1033, 370, ...],
      "temperature": 0.0,
      "avg_logprob": -0.31,
      "compression_ratio": 1.45,
      "no_speech_prob": 0.02
    }
  ]
}

Notes

  • GPU-accelerated (CUDA) — 1-hour audio typically transcribes in under a minute.
  • Single-stream service. Concurrent requests queue serially — plan for it in batch workloads.
  • Maximum file size: ~1 GB. Split longer recordings.
  • No live/streaming transcription yet — post-hoc only.