Skip to main content

Skills

Vision applications often need specific coordinates to things (e.g., bounding boxes, 2d points). Rather than require awkward prompting , Moondream has built-in support for with "Skills".

Available Skills

💬 Query

The most general-purpose skill. Ask questions about images and get intelligent answers. Great for:

  • Visual Q&A systems
  • Image analysis
  • Content verification

📝 Caption

Generate natural language descriptions of images. Perfect for:

  • Creating alt text for accessibility
  • Cataloging visual content
  • Understanding image context
  • Retail item descriptions

🎯 Point

Identify and locate specific elements within images by coordinates. Useful for:

  • UI automation
  • Interactive image annotations
  • Precise element selection

🔍 Detect

Detect and identify objects, people, and elements in images. Ideal for:

  • Object recognition
  • Scene understanding
  • Content moderation