Specifications

Technical Specifications

Moondream offers a range of models optimized for different use cases, from edge devices to high-performance servers. All models support the same core capabilities with different performance characteristics.

Model Variants

Recommended for most use cases

INT8 Model

Balanced performance and quality
Requires 2,624 MiB runtime memory
1,900 MiB
Compressed: 1,733 MiB

Best For

  • • Production APIs
  • • Cloud deployment
  • • High-throughput services
onnx branch
wget https://huggingface.co/vikhyatk/moondream2/resolve/9dddae84d54db4ac56fe37817aeaeb502ed083e2/moondream-2b-int8.mf.gz

Downloads: moondream-2b-int8.mf.gz

INT4 Model

Maximum compression
Requires 2,002 MiB runtime memory
1,290 MiB
Compressed: 1,167 MiB

Best For

  • • Memory constraints
  • • Local development
  • • Testing environments
onnx branch
wget https://huggingface.co/vikhyatk/moondream2/resolve/9dddae84d54db4ac56fe37817aeaeb502ed083e2/moondream-2b-int4.mf.gz

Downloads: moondream-2b-int4.mf.gz

Benchmarks

ReleaseChartQATextVQADocVQARealWorldQACountBenchTallyQAPOPESeedBench2+
2025-01-09
latest
72.273.475.90.60.80.889.855.7
2024-08-26-64.365.270.5-82.6 / 77.689.6 / 88.8 / 87.2-
2024-07-23-64.960.261.9-82.0 / 76.891.3 / 89.7 / 86.9-
2024-05-20-63.157.230.5-82.1 / 76.691.5 / 89.6 / 86.2-
2024-05-08-62.753.130.5-81.6 / 76.190.6 / 88.3 / 85.0-
2024-04-02-61.749.724.3-80.1 / 74.2--
2024-03-13-60.646.422.2-79.6 / 73.3--
2024-03-06-59.843.120.9-79.5 / 73.2--
2024-03-04-58.536.4-----

Benchmark Details:

  • ChartQA: Chart understanding and question answering

  • TextVQA: Reading and understanding text in natural images

  • DocVQA: Document understanding and question answering

  • RealWorldQA: Question answering on real-world images

  • CountBench: Specialized counting questions benchmark

  • TallyQA: Object counting (simple vs. complex questions)

  • POPE: Object presence verification (random/popular/adversarial)

  • SeedBench2+: Extended vision-language understanding benchmark

Feature Support

ModelVisual Q&ACaptioningDetectionPointing
2B Models (FP16/INT8/INT4)
0.5B Models (INT8/INT4)