Skip to main content

Using the interface

This page covers the core API operations for RL finetuning. You send requests and score outputs; Moondream Cloud handles the model updates.

All endpoints use the base URL https://api.moondream.ai/v1/tuning/.

At a high level, there are three operations:

  1. Create a finetune — set up a new finetune to train.
  2. Generate rollouts — ask the model for multiple attempts.
  3. Apply a train step — send those attempts back with rewards so the model can learn.

Everything else (sampling data, rewards, evaluation, stopping criteria) stays in your control.

Create a finetune

Before training, create a finetune to hold your finetuned weights:

POST /finetunes

You provide:

  • name — a unique name for this finetune (alphanumeric, hyphens, and underscores only).
  • rank — the LoRA rank: 8, 16, 24, or 32 (must be a multiple of 8; higher = more capacity, but slower).

The response includes:

  • finetune_id — a unique identifier (ULID) for the finetune. Use this ID for all subsequent operations (rollouts, training, checkpoints).

If you call this endpoint again with the same name and rank, you'll get back the existing finetune_id (idempotent). If the name exists with a different rank, you'll get a 409 Conflict error

Generate rollouts

To start a training iteration, you call:

POST /rollouts

You provide:

  • finetune_id — the finetune ID (from the create finetune response).
  • A skill request — one of query, point, or detect.
  • num_rollouts — how many attempts to generate for this request (currently 1–16).
  • ground_truth (optional) — for point and detect, you may include ground truth so rewards can be computed automatically.

The response includes:

  • the echoed request
  • a list of rollouts (one per attempt)
  • rewards if you provided ground truth; otherwise null

Important: rollouts may include extra metadata needed for training. Treat each rollout object as opaque training data and pass it back unchanged during the train step.

Apply a train step

Once you have rollouts, you compute rewards on your side and call:

POST /train_step

You send a list of groups. Each group contains:

  • the original rollout request
  • the rollouts you received
  • a reward per rollout, in the same order

You can mix skills across groups within the same train step.

The response confirms the step was applied.

Ground truth and rewards

You can either provide ground truth and let the server compute rewards, or compute rewards yourself.

Detect — ground truth is an array of bounding boxes (normalized 0–1):

{
"boxes": [
{ "x_min": 0.10, "y_min": 0.20, "x_max": 0.40, "y_max": 0.60 }
]
}

Point — ground truth can be coordinates or bounding boxes (normalized 0–1):

{
"points": [{ "x": 0.52, "y": 0.31 }]
}

Or with bounding boxes (rewards based on whether the predicted point falls inside):

{
"boxes": [{ "x_min": 0.10, "y_min": 0.20, "x_max": 0.40, "y_max": 0.60 }]
}

Query — no ground truth support. You must score text outputs yourself using your own reward function.

Use ground truth when you have labeled data and the default reward calculation fits your needs. Use custom rewards when you want to encode specific preferences (e.g., penalizing false positives more than false negatives) or when scoring text outputs.

Settings

All skill requests accept a settings object:

"settings": {
"temperature": 1.0,
"top_p": 1.0,
"max_tokens": 128
}
  • temperature — controls randomness. Use higher values (e.g., 1.0) during training for diverse rollouts, zero for deterministic evaluation.
  • top_p — nucleus sampling threshold. Usually leave at 1.0.
  • max_tokens — maximum output length.

For detect, you can also set max_objects to limit the number of detected objects.

Image requirements

Images must be base64 data URLs:

data:image/jpeg;base64,/9j/4AAQ...
data:image/png;base64,iVBORw0K...
data:image/webp;base64,UklGR...

HTTP/HTTPS URLs are not supported. Images larger than ~8MB (decoded) or with very large dimensions will be rejected.

Example: detect skill

Here's a complete request/response cycle for the detect skill. Other skills follow the same pattern—see the HTTP API reference for full schemas.

1. Create a finetune

// POST /finetunes
{
"name": "vehicle-detector",
"rank": 32
}

Response:

{
"finetune_id": "01HXYZ..."
}

Store the finetune_id—you'll use it for all subsequent operations.

2. Generate rollouts

// POST /rollouts
{
"finetune_id": "01HXYZ...",
"num_rollouts": 4,
"request": {
"skill": "detect",
"object": "vehicles",
"image_url": "data:image/jpeg;base64,/9j/4AAQ..."
},
"ground_truth": {
"boxes": [
{ "x_min": 0.10, "y_min": 0.20, "x_max": 0.40, "y_max": 0.60 },
{ "x_min": 0.55, "y_min": 0.30, "x_max": 0.85, "y_max": 0.70 }
]
}
}

Response:

{
"request": { "...echoed request..." },
"rollouts": [
{
"skill": "detect",
"output": {
"objects": [
{ "x_min": 0.12, "y_min": 0.22, "x_max": 0.39, "y_max": 0.58 }
]
},
// ... opaque training metadata
},
// ... 3 more rollouts
],
"rewards": [0.8, 0.3, 0.6, 0.5]
}

The output field contains the model's prediction. Other fields in the rollout object are opaque training metadata—pass them back unchanged.

If you provided ground_truth, rewards are computed automatically. Otherwise, rewards is null and you compute them yourself.

3. Apply a train step

// POST /train_step
{
"finetune_id": "01HXYZ...",
"groups": [
{
"request": { "...echoed request from rollouts response..." },
"rollouts": [ "...rollout objects from rollouts response..." ],
"rewards": [0.8, 0.3, 0.6, 0.5]
}
],
"lr": 0.002
}

Response:

{}

You can batch multiple groups (from different images/requests) into a single train step.

Each train step creates a checkpoint automatically. Use Save to keep the current checkpoint and make it visible in the dashboard.

Evaluation

For evaluation, request a single rollout with temperature set to zero for deterministic output:

// POST /rollouts
{
"finetune_id": "01HXYZ...",
"num_rollouts": 1,
"request": {
"skill": "detect",
"object": "vehicles",
"image_url": "data:image/jpeg;base64,/9j/4AAQ...",
"settings": {
"temperature": 0
}
}
}

During training, higher temperature (e.g., 1.0) encourages diverse rollouts. For evaluation, zero temperature gives the model's best guess so you can measure true accuracy.

Do not send evaluation rollouts to /train_step.

4. Save for production

When you're satisfied with your model's performance, save the current checkpoint:

// POST /finetunes/01HXYZ.../checkpoints/save

Saved checkpoints are persistent and visible in the dashboard.

You can list saved checkpoints with GET /finetunes/:finetuneId/checkpoints.

Managing checkpoints

Training produces checkpoints at each step. Use Save to keep the current checkpoint and make it visible.

Checkpoint lifecycle:

  • Saved — checkpoint is kept and visible in listings
  • Deleted — removed from listings and cleaned up asynchronously

Saving checkpoints

To keep a checkpoint, save the current checkpoint:

// POST /finetunes/:finetuneId/checkpoints/save

Deleting checkpoints

To delete a saved checkpoint, use:

// DELETE /finetunes/:finetuneId/checkpoints/:step

The checkpoint will be cleaned up shortly.

Interface contracts and gotchas

Keep these rules in mind:

  • Rewards must align with rollouts. The rewards array must match the length and order of the rollouts list you're sending.
  • Round‑trip rollout objects. Don't edit, drop, or re‑serialize rollout metadata fields; pass them back as you received them.
  • Use fresh rollouts. Be cautious about reusing old rollouts after training updates—they were generated by a different version of the model.
  • Store your finetune_id. After creating a finetune, save the returned finetune_id. You'll need it for all subsequent operations.

Next steps