Technical Specifications
Moondream offers a range of models optimized for different use cases, from edge devices to high-performance servers. All models support the same core capabilities with different performance characteristics.
Model Variants
Recommended for most use cases
INT8 Model
Balanced performance and quality
Requires 2,624 MiB runtime memory
Best For
- • Production APIs
- • Cloud deployment
- • High-throughput services
onnx branch
wget https://huggingface.co/vikhyatk/moondream2/resolve/9dddae84d54db4ac56fe37817aeaeb502ed083e2/moondream-2b-int8.mf.gz
Downloads: moondream-2b-int8.mf.gz
INT4 Model
Maximum compression
Requires 2,002 MiB runtime memory
Best For
- • Memory constraints
- • Local development
- • Testing environments
onnx branch
wget https://huggingface.co/vikhyatk/moondream2/resolve/9dddae84d54db4ac56fe37817aeaeb502ed083e2/moondream-2b-int4.mf.gz
Downloads: moondream-2b-int4.mf.gz
Benchmarks
Release | ChartQA | TextVQA | DocVQA | RealWorldQA | CountBench | TallyQA | POPE | SeedBench2+ |
---|---|---|---|---|---|---|---|---|
2025-01-09 latest | 72.2 | 73.4 | 75.9 | 0.6 | 0.8 | 0.8 | 89.8 | 55.7 |
2024-08-26 | - | 64.3 | 65.2 | 70.5 | - | 82.6 / 77.6 | 89.6 / 88.8 / 87.2 | - |
2024-07-23 | - | 64.9 | 60.2 | 61.9 | - | 82.0 / 76.8 | 91.3 / 89.7 / 86.9 | - |
2024-05-20 | - | 63.1 | 57.2 | 30.5 | - | 82.1 / 76.6 | 91.5 / 89.6 / 86.2 | - |
2024-05-08 | - | 62.7 | 53.1 | 30.5 | - | 81.6 / 76.1 | 90.6 / 88.3 / 85.0 | - |
2024-04-02 | - | 61.7 | 49.7 | 24.3 | - | 80.1 / 74.2 | - | - |
2024-03-13 | - | 60.6 | 46.4 | 22.2 | - | 79.6 / 73.3 | - | - |
2024-03-06 | - | 59.8 | 43.1 | 20.9 | - | 79.5 / 73.2 | - | - |
2024-03-04 | - | 58.5 | 36.4 | - | - | - | - | - |
Benchmark Details:
ChartQA: Chart understanding and question answering
TextVQA: Reading and understanding text in natural images
DocVQA: Document understanding and question answering
RealWorldQA: Question answering on real-world images
CountBench: Specialized counting questions benchmark
TallyQA: Object counting (simple vs. complex questions)
POPE: Object presence verification (random/popular/adversarial)
SeedBench2+: Extended vision-language understanding benchmark
Feature Support
Model | Visual Q&A | Captioning | Detection | Pointing |
---|---|---|---|---|
2B Models (FP16/INT8/INT4) | ✅ | ✅ | ✅ | ✅ |
0.5B Models (INT8/INT4) | ✅ | ✅ | ✅ | ❌ |