Technical Specifications
Moondream offers a range of models optimized for different use cases, from edge devices to high-performance servers. All models support the same core capabilities with different performance characteristics.
Model Variants
Recommended for most use cases
INT8 Model
Balanced performance and quality
Requires 2,624 MiB runtime memory
Best For
- • Production APIs
- • Cloud deployment
- • High-throughput services
onnx branch
wget https://huggingface.co/vikhyatk/moondream2/resolve/9dddae84d54db4ac56fe37817aeaeb502ed083e2/moondream-2b-int8.mf.gz
Downloads: moondream-2b-int8.mf.gz
INT4 Model
Maximum compression
Requires 2,002 MiB runtime memory
Best For
- • Memory constraints
- • Local development
- • Testing environments
onnx branch
wget https://huggingface.co/vikhyatk/moondream2/resolve/9dddae84d54db4ac56fe37817aeaeb502ed083e2/moondream-2b-int4.mf.gz
Downloads: moondream-2b-int4.mf.gz
Benchmarks
Release | VQAv2 | GQA | TextVQA | DocVQA | TallyQA | POPE |
---|---|---|---|---|---|---|
(simple/full) | (rand/pop/adv) | |||||
2024-08-26 latest | 80.3 | 64.3 | 65.2 | 70.5 | 82.6 / 77.6 | 89.6 / 88.8 / 87.2 |
2024-07-23 | 79.4 | 64.9 | 60.2 | 61.9 | 82.0 / 76.8 | 91.3 / 89.7 / 86.9 |
2024-05-20 | 79.4 | 63.1 | 57.2 | 30.5 | 82.1 / 76.6 | 91.5 / 89.6 / 86.2 |
2024-05-08 | 79.0 | 62.7 | 53.1 | 30.5 | 81.6 / 76.1 | 90.6 / 88.3 / 85.0 |
2024-04-02 | 77.7 | 61.7 | 49.7 | 24.3 | 80.1 / 74.2 | - |
2024-03-13 | 76.8 | 60.6 | 46.4 | 22.2 | 79.6 / 73.3 | - |
2024-03-06 | 75.4 | 59.8 | 43.1 | 20.9 | 79.5 / 73.2 | - |
2024-03-04 | 74.2 | 58.5 | 36.4 | - | - | - |
Benchmark Details:
VQAv2: Visual Question Answering v2 dataset - General visual reasoning
GQA: Grounded Question Answering - Compositional visual reasoning
TextVQA: Text Visual Question Answering - Reading text in images
DocVQA: Document Visual Question Answering - Understanding documents
TallyQA: Counting questions (simple vs. full complexity) - Object counting
POPE: Popular Objects in Common Environment - Object presence verification (random/popular/adversarial)
Feature Support
Model | Visual Q&A | Captioning | Detection | Pointing |
---|---|---|---|---|
2B Models (FP16/INT8/INT4) | ✅ | ✅ | ✅ | ✅ |
0.5B Models (INT8/INT4) | ✅ | ✅ | ✅ | ❌ |