Skip to main content

Segment

The /segment endpoint generates precise SVG path segmentation masks for specific objects in images. It returns an SVG path string that outlines the object, along with a bounding box.

Example Request

import moondream as md
from PIL import Image

# Initialize with API key
model = md.vl(api_key="your-api-key")

# Load an image
image = Image.open("path/to/image.jpg")

# Segment an object
result = model.segment(image, "cat")
svg_path = result["path"]
bbox = result["bbox"]

print(f"SVG Path: {svg_path[:100]}...")
print(f"Bounding box: {bbox}")

# With spatial hint (point) to guide segmentation
result = model.segment(image, "cat", spatial_refs=[[0.5, 0.3]])

# With spatial hint (bounding box)
result = model.segment(image, "cat", spatial_refs=[[0.2, 0.1, 0.8, 0.9]])

Example Response

{
"path": "M 0 0.76 L 0 0.32 L 0.03 0.31 C 0.04 0.30 0.09 0.28 0.12 0.27...",
"bbox": {
"x_min": 0.0002,
"y_min": 0.0039,
"x_max": 0.7838,
"y_max": 0.9971
}
}

Spatial References

You can guide the segmentation by providing spatial references - either points or bounding boxes with normalized coordinates (0-1):

  • Point: [x, y] - A single point to indicate the object location
  • Bounding box: [x1, y1, x2, y2] - A region containing the object
# Point hint - segment object near center
result = model.segment(image, "person", spatial_refs=[[0.5, 0.5]])

# Bounding box hint - segment object within region
result = model.segment(image, "person", spatial_refs=[[0.1, 0.2, 0.6, 0.8]])

# Multiple hints
result = model.segment(image, "person", spatial_refs=[[0.3, 0.4], [0.7, 0.6]])

Streaming

Segment supports streaming to receive the bounding box immediately and coarse path updates as they're generated:

import moondream as md
from PIL import Image

model = md.vl(api_key="your-api-key")
image = Image.open("path/to/image.jpg")

# Stream segmentation updates
for update in model.segment(image, "cat", stream=True):
if "bbox" in update and not update.get("completed"):
# Bounding box available in first message
print(f"Bbox: {update['bbox']}")
if "chunk" in update:
# Coarse path chunks
print(update["chunk"], end="")
if update.get("completed"):
# Final refined path
print(f"\nFinal path: {update['path'][:100]}...")

Streaming Response Format

When streaming, you'll receive Server-Sent Events with different message types:

  1. Bounding box (first message):
{"type": "bbox", "bbox": {"x_min": 0.0, "y_min": 0.0, "x_max": 0.78, "y_max": 0.99}}
  1. Path chunks (coarse path updates):
{"type": "path_delta", "chunk": "M 0 0.76", "completed": false}
  1. Final message (refined path):
{"type": "final", "path": "M 0 0.76 L 0 0.32...", "bbox": {...}, "completed": true}

Using the SVG Path

The path coordinates are normalized to [0, 1] relative to the bounding box, not the full image. To render the mask correctly, you need to position and scale the path within the bounding box region:

<div style="position: relative; width: 800px; height: 600px;">
<img src="your-image.jpg" style="width: 100%; height: 100%;" />
<!-- Position SVG within the bounding box region -->
<svg
viewBox="0 0 1 1"
preserveAspectRatio="none"
style="
position: absolute;
left: calc(x_min * 100%);
top: calc(y_min * 100%);
width: calc((x_max - x_min) * 100%);
height: calc((y_max - y_min) * 100%);
"
>
<path d="M 0 0.76 L 0 0.32..." fill="rgba(255,0,0,0.3)" stroke="red" stroke-width="0.01"/>
</svg>
</div>

In practice, replace the calc() expressions with actual values from the bbox:

const { bbox, path } = result;
const container = document.querySelector('.image-container');
const svg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
svg.setAttribute('viewBox', '0 0 1 1');
svg.setAttribute('preserveAspectRatio', 'none');
svg.style.position = 'absolute';
svg.style.left = `${bbox.x_min * 100}%`;
svg.style.top = `${bbox.y_min * 100}%`;
svg.style.width = `${(bbox.x_max - bbox.x_min) * 100}%`;
svg.style.height = `${(bbox.y_max - bbox.y_min) * 100}%`;

const pathEl = document.createElementNS('http://www.w3.org/2000/svg', 'path');
pathEl.setAttribute('d', path);
pathEl.setAttribute('fill', 'rgba(255,0,0,0.3)');
svg.appendChild(pathEl);
container.appendChild(svg);

Segment vs. Detect

FeatureSegmentDetect
OutputSVG path (pixel-level mask)Bounding box
PrecisionExact object boundariesRectangular approximation
Use caseCutouts, masks, precise selectionObject counting, localization
StreamingYes (bbox + path chunks)No

Learn More: