POST /v1/chat/completions

Generate a chat response from a model.

Request

POST https://api.chris.hellotopia.io/v1/chat/completions
Authorization: Bearer <api_key>
Content-Type: application/json

Required

Field	Type	Notes
`model`	string	Model ID from Models.
`messages`	array	`{"role": "system\\|user\\|assistant", "content": "..."}`.

Common options

Field	Type	Default	Notes
`stream`	bool	`false`	SSE streaming.
`temperature`	number	0.7	0–2.
`max_tokens`	int	model default	Cap on output tokens.
`top_p`	number	1.0	Nucleus sampling.
`stop`	string\|array	—	Stop sequences.
`n`	int	1	Number of choices.
`seed`	int	—	Reproducible sampling (best-effort).

Example — non-streaming

curl https://api.chris.hellotopia.io/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "messages": [
      {"role": "system", "content": "You are terse."},
      {"role": "user",   "content": "What is 2+2?"}
    ],
    "max_tokens": 20
  }'

Example — streaming

from openai import OpenAI

client = OpenAI(
    base_url="https://api.chris.hellotopia.io/v1",
    api_key="sk-...",
)

stream = client.chat.completions.create(
    model="llama3.3:70b",
    messages=[{"role": "user", "content": "Write a haiku about compilers."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Example — vision (image input)

from openai import OpenAI
import base64, pathlib

img = base64.b64encode(pathlib.Path("photo.jpg").read_bytes()).decode()

client = OpenAI(base_url="https://api.chris.hellotopia.io/v1", api_key="sk-...")
resp = client.chat.completions.create(
    model="5080/llama3.2-vision:11b",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What is in this image?"},
            {"type": "image_url",
             "image_url": {"url": f"data:image/jpeg;base64,{img}"}},
        ],
    }],
)
print(resp.choices[0].message.content)

Response

Standard OpenAI shape:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1713456789,
  "model": "llama3.1:8b",
  "choices": [
    {
      "index": 0,
      "message": {"role": "assistant", "content": "4"},
      "finish_reason": "stop"
    }
  ],
  "usage": {"prompt_tokens": 23, "completion_tokens": 1, "total_tokens": 24}
}

Tool / function calling

Supported only for models whose underlying Ollama build exposes tool-calling (llama3.1+, qwen3+, llama3.3). Payload format matches OpenAI's tools / tool_choice fields. No guarantee that a given model honors tool_choice: "required" — test first.

Errors

HTTP	Meaning
401	Missing/invalid API key.
404	Unknown `model` ID.
408/504	Upstream Ollama didn't respond within 120s — usually a cold load on a busy model. Retry.
429	Rate-limited (budget/limit on your key, if configured).
5xx	Gateway or backend error. Check status with Chris.