Skills
Vision applications often need specific coordinates to things (e.g., bounding boxes, 2d points). Rather than require awkward prompting , Moondream has built-in support for with "Skills".
Available Skills
💬 Query
The most general-purpose skill. Ask questions about images and get intelligent answers. Great for:
- Visual Q&A systems
- Image analysis
- Content verification
📝 Caption
Generate natural language descriptions of images. Perfect for:
- Creating alt text for accessibility
- Cataloging visual content
- Understanding image context
- Retail item descriptions
🎯 Point
Identify and locate specific elements within images by coordinates. Useful for:
- UI automation
- Interactive image annotations
- Precise element selection
🔍 Detect
Detect and identify objects, people, and elements in images. Ideal for:
- Object recognition
- Scene understanding
- Content moderation