Skip to content

Getting Started

1. Get an API key

Ask Chris. Keys look like sk-.... One key per person — do not share.

Store it as an environment variable:

export OPENAI_API_KEY="sk-your-key-here"
export OPENAI_BASE_URL="https://api.chris.hellotopia.io/v1"

Most OpenAI SDKs pick these up automatically.

2. Your first request

curl https://api.chris.hellotopia.io/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "messages": [{"role": "user", "content": "Say hello in one word."}]
  }'
from openai import OpenAI

client = OpenAI(
    base_url="https://api.chris.hellotopia.io/v1",
    api_key="sk-your-key-here",
)

resp = client.chat.completions.create(
    model="llama3.1:8b",
    messages=[{"role": "user", "content": "Say hello in one word."}],
)
print(resp.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.chris.hellotopia.io/v1",
  apiKey: "sk-your-key-here",
});

const resp = await client.chat.completions.create({
  model: "llama3.1:8b",
  messages: [{ role: "user", content: "Say hello in one word." }],
});
console.log(resp.choices[0].message.content);

3. List available models

curl https://api.chris.hellotopia.io/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY" | jq '.data[].id'

See Models for the full catalog with routing notes.

4. Try something bigger

curl https://api.chris.hellotopia.io/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.3:70b",
    "messages": [{"role": "user", "content": "Explain FLOPS to a 12-year-old."}]
  }'

llama3.3:70b runs on the DGX Spark. First response may take 30–90 seconds if the model isn't loaded; subsequent requests within 30 minutes are fast.

Notes on behavior

  • Cold loads are slow. Models not recently used pay a disk-load penalty (20–90s). Warm models respond in normal token-stream time.
  • Streaming is supported. Pass stream: true and consume the text/event-stream.
  • Context length defaults to 16384 tokens on 5080 models. Override per-request with max_tokens and provider-specific options.
  • Timeout: gateway enforces a 120s timeout. Set your client timeout at least that high.