Getting Started
1. Get an API key
Ask Chris. Keys look like sk-.... One key per person — do not share.
Store it as an environment variable:
export OPENAI_API_KEY="sk-your-key-here"
export OPENAI_BASE_URL="https://api.chris.hellotopia.io/v1"
Most OpenAI SDKs pick these up automatically.
2. Your first request
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.chris.hellotopia.io/v1",
apiKey: "sk-your-key-here",
});
const resp = await client.chat.completions.create({
model: "llama3.1:8b",
messages: [{ role: "user", content: "Say hello in one word." }],
});
console.log(resp.choices[0].message.content);
3. List available models
curl https://api.chris.hellotopia.io/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY" | jq '.data[].id'
See Models for the full catalog with routing notes.
4. Try something bigger
curl https://api.chris.hellotopia.io/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.3:70b",
"messages": [{"role": "user", "content": "Explain FLOPS to a 12-year-old."}]
}'
llama3.3:70b runs on the DGX Spark. First response may take 30–90 seconds if the model isn't loaded; subsequent requests within 30 minutes are fast.
Notes on behavior
- Cold loads are slow. Models not recently used pay a disk-load penalty (20–90s). Warm models respond in normal token-stream time.
- Streaming is supported. Pass
stream: trueand consume thetext/event-stream. - Context length defaults to 16384 tokens on 5080 models. Override per-request with
max_tokensand provider-specific options. - Timeout: gateway enforces a 120s timeout. Set your client timeout at least that high.