⚠️
🚀
The latest model release (moondream2-2025-01-09) is currently only available via Hugging Face and can be used with the Hugging Face Transformers library. We are actively working on integrating it into our client libraries.
Installing Pyvips (required for local hf installation)
For Linux/Mac users, simply run:
pip install pyvips-binary pyvips
For Windows users:
- Download ‘vips-dev-w64-all-8.16.0.zip’ (64-bit) or ‘vips-dev-w32-all-8.16.0.zip’ (32-bit) from libvips Windows Releases
- Extract and copy DLLs from vips-dev-8.16\bin to your project root directory (easier) or System32 (requires admin privileges)
- If you choose to add the bin directory to your system PATH, you will need to restart your terminal session
Using Latest Moondream2 via Hugging Face
The latest version of Moondream2 can be used directly with Hugging Face Transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer from PIL import Image model = AutoModelForCausalLM.from_pretrained( "vikhyatk/moondream2", revision="2025-01-09", trust_remote_code=True, # Uncomment to run on GPU. # device_map={"": "cuda"} ) # Captioning print("Short caption:") print(model.caption(image, length="short")["caption"]) print("\nNormal caption:") for t in model.caption(image, length="normal", stream=True)["caption"]: # Streaming generation example, supported for caption() and detect() print(t, end="", flush=True) # Visual Querying print("\nVisual query: 'How many people are in the image?'") print(model.query(image, "How many people are in the image?")["answer"]) # Object Detection print("\nObject detection: 'face'") objects = model.detect(image, "face")["objects"] print(f"Found {len(objects)} face(s)") # Pointing print("\nPointing: 'person'") points = model.point(image, "person")["points"] print(f"Found {len(points)} person(s)")
Quick Start
Select your deployment environment
Environment Choose your deployment method | |
---|---|
Cloud API | Local Deployment |
Get Your API Key
Visit console.moondream.ai to create an account and get your API key.
Installation Script
# Install dependencies in your project directory # pip install moondream import moondream as md from PIL import Image # Initialize with API key model = md.vl(api_key="your-api-key") # Load an image image = Image.open("./path/to/image.jpg") encoded_image = model.encode_image(image) # Encode image (recommended for multiple operations) # Generate a caption (length options: "short" or "normal" (default)) caption = model.caption(encoded_image)["caption"] print("Caption:", caption) # Stream the caption for chunk in model.caption(encoded_image, stream=True)["caption"]: print(chunk, end="", flush=True) # Ask a question answer = model.query(encoded_image, "What's in this image?")["answer"] print("Answer:", answer) # Stream the answer for chunk in model.query(encoded_image, "What's in this image?", stream=True)["answer"]: print(chunk, end="", flush=True) # Detect objects detect_result = model.detect(image, 'subject') # change 'subject' to what you want to detect print("Detected objects:", detect_result["objects"]) # Point at an object point_result = model.point(image, 'subject') # change 'subject' to what you want to point at print("Points:", point_result["points"])
Explore Our Cloud Endpoints
💬
/query
Ask natural language questions about images and receive detailed answers
Learn more→
📝
/caption
Generate accurate and natural image captions
Learn more→
🔍
/detect
Detect and locate objects in images
Learn more→
📍
/point
Get precise coordinate locations for objects in images
Learn more→