Getting Started
Moondream is a powerful, fast, and efficient vision AI model. It can answer questions, detect objects, count and point, caption, perform OCR, and more. Grab an API key at the Moondream Cloud Console and try it out!
- Query
- Detect
- Point
- Caption
Visual Question Answering - Ask natural language questions about images.
curl -X POST https://api.moondream.ai/v1/query \
-H 'Content-Type: application/json' \
-H 'X-Moondream-Auth: YOUR_API_KEY' \
-d '{
"image_url": "",
"question": "What is in this image?"
}'
Response:
{
"request_id": "2025-03-25_query_2025-03-25-21:00:39-715d03",
"answer": "The image is a grayscale depiction of a crescent moon against a black background. The moon is rendered in varying shades of gray, appearing as a smooth, curved shape with no visible craters or details."
}
Object Detection - Identify and locate objects with bounding boxes.
curl -X POST https://api.moondream.ai/v1/detect \
-H 'Content-Type: application/json' \
-H 'X-Moondream-Auth: YOUR_API_KEY' \
-d '{
"image_url": "",
"object": "moon"
}'
Response:
{
"request_id": "2025-03-25_detect_2025-03-25-21:00:39-715d03",
"objects": [
{
"x_min": 0.2,
"y_min": 0.3,
"x_max": 0.6,
"y_max": 0.8
}
]
}
Object Pointing - Get precise center coordinates for objects.
curl -X POST https://api.moondream.ai/v1/point \
-H 'Content-Type: application/json' \
-H 'X-Moondream-Auth: YOUR_API_KEY' \
-d '{
"image_url": "",
"object": "moon"
}'
Response:
{
"request_id": "2025-03-25_point_2025-03-25-21:00:39-715d03",
"points": [
{
"x": 0.65,
"y": 0.42
}
]
}
Image Captioning - Generate natural language descriptions of images.
curl -X POST https://api.moondream.ai/v1/caption \
-H 'Content-Type: application/json' \
-H 'X-Moondream-Auth: YOUR_API_KEY' \
-d '{
"image_url": "",
"length": "normal",
"stream": false
}'
Response:
{
"caption": "A crescent moon shape is centered against a solid black background. The crescent is oriented with its convex side facing right and its concave side facing left. The image is monochromatic, with shades of gray and white. No other objects, patterns, or text are visible.",
"metrics": {
"input_tokens": 735,
"output_tokens": 45,
"prefill_time_ms": 43.51004003547132,
"decode_time_ms": 415.3184471651912,
"ttft_ms": 81.97528193704784
},
"finish_reason": "stop"
}
Moondream SDK
- Python
- Node.js
Installation:
pip install moondream
View Python SDK Documentation →
- Query
- Detect
- Point
- Caption
Visual Question Answering - Ask natural language questions about images.
import moondream as md
from PIL import Image
# Initialize with your API key
model = md.vl(api_key="YOUR_API_KEY")
# Load an image
image = Image.open("path/to/image.jpg")
# Ask a question
result = model.query(image, "What is in this image?")
print(result["answer"])
Object Detection - Identify and locate objects with bounding boxes.
import moondream as md
from PIL import Image
# Initialize with your API key
model = md.vl(api_key="YOUR_API_KEY")
# Load an image
image = Image.open("path/to/image.jpg")
# Detect objects
result = model.detect(image, "moon")
for obj in result["objects"]:
print(f"Bounds: ({obj['x_min']}, {obj['y_min']}) to ({obj['x_max']}, {obj['y_max']})")
Object Pointing - Get precise center coordinates for objects.
import moondream as md
from PIL import Image
# Initialize with your API key
model = md.vl(api_key="YOUR_API_KEY")
# Load an image
image = Image.open("path/to/image.jpg")
# Locate objects
result = model.point(image, "moon")
for point in result["points"]:
print(f"Center: ({point['x']}, {point['y']})")
Image Captioning - Generate natural language descriptions of images.
import moondream as md
from PIL import Image
# Initialize with your API key
model = md.vl(api_key="YOUR_API_KEY")
# Load an image
image = Image.open("path/to/image.jpg")
# Generate a caption
result = model.caption(image, length="normal")
print(result["caption"])
Installation:
npm install moondream
View Node.js SDK Documentation →
- Query
- Detect
- Point
- Caption
Visual Question Answering - Ask natural language questions about images.
import { vl } from 'moondream';
import fs from 'fs';
// Initialize with your API key
const model = new vl({ apiKey: 'YOUR_API_KEY' });
// Load an image
const image = fs.readFileSync('path/to/image.jpg');
// Ask a question
const result = await model.query({
image: image,
question: 'What is in this image?'
});
console.log(result.answer);
Object Detection - Identify and locate objects with bounding boxes.
import { vl } from 'moondream';
import fs from 'fs';
// Initialize with your API key
const model = new vl({ apiKey: 'YOUR_API_KEY' });
// Load an image
const image = fs.readFileSync('path/to/image.jpg');
// Detect objects
const result = await model.detect({
image: image,
object: 'moon'
});
result.objects.forEach(obj => {
console.log(`Bounds: (${obj.x_min}, ${obj.y_min}) to (${obj.x_max}, ${obj.y_max})`);
});
Object Pointing - Get precise center coordinates for objects.
import { vl } from 'moondream';
import fs from 'fs';
// Initialize with your API key
const model = new vl({ apiKey: 'YOUR_API_KEY' });
// Load an image
const image = fs.readFileSync('path/to/image.jpg');
// Locate objects
const result = await model.point({
image: image,
object: 'moon'
});
result.points.forEach(point => {
console.log(`Center: (${point.x}, ${point.y})`);
});
Image Captioning - Generate natural language descriptions of images.
import { vl } from 'moondream';
import fs from 'fs';
// Initialize with your API key
const model = new vl({ apiKey: 'YOUR_API_KEY' });
// Load an image
const image = fs.readFileSync('path/to/image.jpg');
// Generate a caption
const result = await model.caption({
image: image,
length: 'normal'
});
console.log(result.caption);
More Examples: Check out our Moondream Examples repo for complete projects and use cases.
Running Locally
Want to run Moondream on your own hardware instead of using the Cloud API?
- Mac/Linux: Use Moondream Station - the easiest way to run locally
- Advanced: Use Hugging Face Transformers for custom integration
Next Steps
- Try it live: Use our interactive playground to test without coding
- Deep dive: Explore detailed documentation for each skill - Query, Detect, Point, Caption
- More capabilities: Check out all Moondream Skills