Skip to main content

Overview

Moondream is an open-source family of Vision Language Models (VLMs) built for powerful, efficient visual reasoning. Our newest release, Moondream 3 Preview, is a mixture-of-experts model with grounded visual reasoning, a 32k context window, and native support for multiple vision skills—like pointing, counting, and object detection—all designed with a deployment-friendly ethos.

Moondream 3 Preview is now the default model for our cloud API and local processing with Moondream Station – get started here.

Key stats

  • 9B total params, 2B active params (maintains similar inference speeds to our previous models)
  • 32k context window (up from 2k)

Model Skills

Moondream has built-in vision-specific skills that make it easy to generate specific types of vision outputs (e.g., bounding boxes, or 2D points). These are:

  • Object Detection
  • Pointing and Counting
  • Visual Question Answering
  • Captioning

Performance Benchmarks

Here are some early benchmark results. We show it alongside some top frontier models for comparison. Moondream also produces answers in fraction of the time of these bigger models. We'll publish more complete results later and include inference times to make this clearer.

TaskMoondream 3 PreviewGPT 5Gemini 2.5-FlashClaude 4 Sonnet
Object Detection
refcocog88.649.875.126.2
refcoco+81.846.370.223.4
refcoco91.157.275.830.1
Counting
CountbenchQA93.289.381.290.1
Document Understanding
ChartQA86.685*79.574.3*
DocVQA88.389*94.289.5*
Hallucination (higher is better)
pope89.088.488.184.6

License

Copyright (c) 2025 M87 Labs, Inc. This distribution includes Model Weights licensed under the Business Source License 1.1 with an Additional Use Grant (No Third-Party Service). Commercial hosting or rehosting requires an agreement with contact@m87.ai.


Learn more