OpenAI Compatibility
Moondream Cloud exposes an OpenAI-compatible chat endpoint at /v1/chat/completions. Point any OpenAI client or SDK at Moondream and you get multi-turn conversations, image inputs, streaming, and reasoning — no Moondream-specific client required.
Setup
Use the OpenAI base URL https://api.moondream.ai/v1, your Moondream API key as the bearer token, and moondream/moondream3-preview as the model (the moondream/ prefix is optional for this first-party model, so moondream3-preview also works). Grab a key from the Moondream Cloud Console.
- Python
- Node.js
- curl
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.moondream.ai/v1",
)
response = client.chat.completions.create(
model="moondream/moondream3-preview",
messages=[{"role": "user", "content": "What is 2 + 2?"}],
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://api.moondream.ai/v1",
});
const response = await client.chat.completions.create({
model: "moondream/moondream3-preview",
messages: [{ role: "user", content: "What is 2 + 2?" }],
});
console.log(response.choices[0].message.content);
curl https://api.moondream.ai/v1/chat/completions \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"model": "moondream/moondream3-preview",
"messages": [{"role": "user", "content": "What is 2 + 2?"}]
}'
Image input
Pass images as image_url content parts. Images must be base64-encoded data URLs — remote http(s) URLs are rejected with a 400. You can include more than one image in a turn.
- Python
- curl
import base64
with open("image.jpg", "rb") as f:
data_url = "data:image/jpeg;base64," + base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="moondream/moondream3-preview",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": data_url}},
{"type": "text", "text": "What is in this image?"},
],
}],
)
print(response.choices[0].message.content)
curl https://api.moondream.ai/v1/chat/completions \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"model": "moondream/moondream3-preview",
"messages": [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}},
{"type": "text", "text": "What is in this image?"}
]
}]
}'
Multi-turn conversations
Send the full message history — earlier turns are kept as context.
response = client.chat.completions.create(
model="moondream/moondream3-preview",
messages=[
{"role": "user", "content": "My name is Alice."},
{"role": "assistant", "content": "Nice to meet you, Alice!"},
{"role": "user", "content": "What is my name?"},
],
)
print(response.choices[0].message.content) # -> "Alice"
Streaming
Set stream=True to receive the response as Server-Sent Events.
stream = client.chat.completions.create(
model="moondream/moondream3-preview",
messages=[{"role": "user", "content": "Write a short poem about the moon."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
By default a stream does not include token usage. To receive it, set stream_options — a final chunk with empty choices then carries the usage totals (standard OpenAI behavior).
stream = client.chat.completions.create(
model="moondream/moondream3-preview",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
stream_options={"include_usage": True},
)
for chunk in stream:
if chunk.usage:
print(chunk.usage) # prompt_tokens / completion_tokens / total_tokens
Reasoning
Moondream 3 can produce an explicit reasoning trace before its answer. Enable it with the reasoning parameter; the trace is returned on message.reasoning, separate from message.content.
response = client.chat.completions.create(
model="moondream/moondream3-preview",
messages=[{"role": "user", "content": "If I have 5 apples and give away 2, how many are left?"}],
extra_body={"reasoning": True},
)
message = response.choices[0].message
print(message.reasoning) # the step-by-step trace
print(message.content) # -> "3"
See Reasoning for more on how Moondream 3 reasons.
Parameters
| Parameter | Type | Notes |
|---|---|---|
model | string | moondream/moondream3-preview. See Models for the available list. |
messages | array | OpenAI chat messages. content may be a string or an array of text / image_url parts. |
temperature | number | Sampling temperature. |
top_p | number | Nucleus sampling. |
max_completion_tokens | integer | Maximum number of tokens to generate, including reasoning tokens (up to 4096). |
reasoning | boolean | Enable the reasoning trace (returned on message.reasoning). |
stream | boolean | Stream the response as SSE. |
stream_options.include_usage | boolean | Emit a final usage chunk in a stream. |
Image URLs must be base64 data URLs (no remote URLs). stop sequences and tool/function calling are not currently supported.
Models
List the available models with the standard OpenAI models endpoint:
curl https://api.moondream.ai/v1/models \
-H 'Authorization: Bearer YOUR_API_KEY'