Quick Start

The latest model release (moondream2-2025-01-09) is currently only available via Hugging Face and can be used with the Hugging Face Transformers library. We are actively working on integrating it into our client libraries.

Installing Pyvips (required for local hf installation)

For Linux/Mac users, simply run:

pip install pyvips-binary pyvips

For Windows users:

  1. Download ‘vips-dev-w64-all-8.16.0.zip’ (64-bit) or ‘vips-dev-w32-all-8.16.0.zip’ (32-bit) from libvips Windows Releases
  2. Extract and copy DLLs from vips-dev-8.16\bin to your project root directory (easier) or System32 (requires admin privileges)
  3. If you choose to add the bin directory to your system PATH, you will need to restart your terminal session

Using Latest Moondream2 via Hugging Face

The latest version of Moondream2 can be used directly with Hugging Face Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
model = AutoModelForCausalLM.from_pretrained(
    # Uncomment to run on GPU.
    # device_map={"": "cuda"}
# Captioning
print("Short caption:")
print(model.caption(image, length="short")["caption"])
print("\nNormal caption:")
for t in model.caption(image, length="normal", stream=True)["caption"]:
    # Streaming generation example, supported for caption() and detect()
    print(t, end="", flush=True)
# Visual Querying
print("\nVisual query: 'How many people are in the image?'")
print(model.query(image, "How many people are in the image?")["answer"])
# Object Detection
print("\nObject detection: 'face'")
objects = model.detect(image, "face")["objects"]
print(f"Found {len(objects)} face(s)")
# Pointing
print("\nPointing: 'person'")
points = model.point(image, "person")["points"]
print(f"Found {len(points)} person(s)")

Quick Start

Choose your deployment method
Cloud API
Local Deployment

Get Your API Key

Visit console.moondream.ai to create an account and get your API key.

Installation Script
# Install dependencies in your project directory
# pip install moondream

import moondream as md
from PIL import Image

# Initialize with API key
model = md.vl(api_key="your-api-key")

# Load an image
image = Image.open("./path/to/image.jpg")
encoded_image = model.encode_image(image)  # Encode image (recommended for multiple operations)

# Generate a caption (length options: "short" or "normal" (default))
caption = model.caption(encoded_image)["caption"]
print("Caption:", caption)

# Stream the caption
for chunk in model.caption(encoded_image, stream=True)["caption"]:
    print(chunk, end="", flush=True)

# Ask a question
answer = model.query(encoded_image, "What's in this image?")["answer"]
print("Answer:", answer)

# Stream the answer
for chunk in model.query(encoded_image, "What's in this image?", stream=True)["answer"]:
    print(chunk, end="", flush=True)

# Detect objects
detect_result = model.detect(image, 'subject')  # change 'subject' to what you want to detect
print("Detected objects:", detect_result["objects"])

# Point at an object
point_result = model.point(image, 'subject')  # change 'subject' to what you want to point at
print("Points:", point_result["points"])

Explore Our Cloud Endpoints

Troubleshooting & FAQ

Common Questions