Skip to content

POST /v1/chat/completions

Generate a chat response from a model.

Request

POST https://api.chris.hellotopia.io/v1/chat/completions
Authorization: Bearer <api_key>
Content-Type: application/json

Required

Field Type Notes
model string Model ID from Models.
messages array {"role": "system\|user\|assistant", "content": "..."}.

Common options

Field Type Default Notes
stream bool false SSE streaming.
temperature number 0.7 0–2.
max_tokens int model default Cap on output tokens.
top_p number 1.0 Nucleus sampling.
stop string|array Stop sequences.
n int 1 Number of choices.
seed int Reproducible sampling (best-effort).

Example — non-streaming

curl https://api.chris.hellotopia.io/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "messages": [
      {"role": "system", "content": "You are terse."},
      {"role": "user",   "content": "What is 2+2?"}
    ],
    "max_tokens": 20
  }'

Example — streaming

from openai import OpenAI

client = OpenAI(
    base_url="https://api.chris.hellotopia.io/v1",
    api_key="sk-...",
)

stream = client.chat.completions.create(
    model="llama3.3:70b",
    messages=[{"role": "user", "content": "Write a haiku about compilers."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Example — vision (image input)

from openai import OpenAI
import base64, pathlib

img = base64.b64encode(pathlib.Path("photo.jpg").read_bytes()).decode()

client = OpenAI(base_url="https://api.chris.hellotopia.io/v1", api_key="sk-...")
resp = client.chat.completions.create(
    model="5080/llama3.2-vision:11b",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What is in this image?"},
            {"type": "image_url",
             "image_url": {"url": f"data:image/jpeg;base64,{img}"}},
        ],
    }],
)
print(resp.choices[0].message.content)

Response

Standard OpenAI shape:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1713456789,
  "model": "llama3.1:8b",
  "choices": [
    {
      "index": 0,
      "message": {"role": "assistant", "content": "4"},
      "finish_reason": "stop"
    }
  ],
  "usage": {"prompt_tokens": 23, "completion_tokens": 1, "total_tokens": 24}
}

Tool / function calling

Supported only for models whose underlying Ollama build exposes tool-calling (llama3.1+, qwen3+, llama3.3). Payload format matches OpenAI's tools / tool_choice fields. No guarantee that a given model honors tool_choice: "required" — test first.

Errors

HTTP Meaning
401 Missing/invalid API key.
404 Unknown model ID.
408/504 Upstream Ollama didn't respond within 120s — usually a cold load on a busy model. Retry.
429 Rate-limited (budget/limit on your key, if configured).
5xx Gateway or backend error. Check status with Chris.