vss-summarize-video

star 1.2k

Use to summarize a recorded video via the LVS summarization microservice (HITL-gated) with a VLM fallback. Not for report generation or live RTSP captioning.

NVIDIA By NVIDIA schedule Updated 6/9/2026

name: vss-summarize-video description: Use to summarize a recorded video via the LVS summarization microservice (HITL-gated) with a VLM fallback. Not for report generation or live RTSP captioning. license: Apache-2.0 metadata: version: "3.2.0" author: "NVIDIA Video Search and Summarization team" github-url: "https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization" tags: "nvidia blueprint operational"

Instructions

Follow the routing tables and step-by-step workflows below. Each section that ends in workflow, quick start, or flow is intended to be executed top-to-bottom. Detailed reference material lives in references/.

Examples

Worked end-to-end examples are kept under evals/ (each *.json manifest contains a runnable scenario) and inline in the per-workflow curl blocks below. Run a Tier-3 evaluation with nv-base validate <this-skill-dir> --agent-eval to replay them.

Call the VLM NIM or the video summarization microservice directly. Always run curl commands yourself; never instruct the user to run them.

Primary video workflow query type: "Summarize this video." Direct video summarization API and service-ops requests are handled by the reference-routed sections below.

Purpose

Produce a single, polished narrative summary of one recorded video clip, with timestamped events when the LVS microservice path is reachable.

Do NOT use this skill for:

  • Live RTSP captioning — use vss-deploy-dense-captioning.
  • Report generation, including incident or alert-window reports — use vss-generate-video-report Mode B.
  • Semantic search across the archive — use vss-search-archive.

Prerequisites

  • VSS lvs profile running on $HOST_IP (port 38111) OR a reachable VLM/RT-VLM endpoint as a fallback. The vss-deploy-profile skill brings these up.
  • Network reachability from the agent host to both endpoints; clip URLs from VIOS must be fetchable by the chosen backend.
  • jq and curl available on the agent host.

Limitations

  • Direct VLM fallback uses a single fixed prompt and cannot target scenario/events — output quality is lower than the LVS path.
  • Remote VLM endpoints generally cannot reach localhost/private clip URLs.
  • One backend call per request; no parallel hedging or multi-pass summaries.

Troubleshooting

Symptom Cause Fix
/v1/ready returns 503 repeatedly LVS service still warming up Retry up to ~30 s as shown in Setup; if it never returns 200 the service may not be deployed
Empty video_summary and events Clip does not contain the requested events Re-run with broader scenario or different events
VLM returns <think> block Cosmos Reason 2 reasoning mode Strip everything up to </think> before rendering
Empty stdout from curl /v1/ready Service legitimately returns 200 with empty body Always check HTTP status with -o /dev/null -w '%{http_code}', never inspect the body

See references/video-summarization-debugging.md for deeper diagnostics.

Reference Map

Use these references only when the user asks for the relevant detail, or when the core workflow below needs deeper video summarization information:

Load video-summarization-api.md only when you need a request field, response shape, or endpoint that is not already covered by the Step 2 LVS or fallback VLM example below, or when handling a direct video summarization API request. Load video-summarization-deployment.md only for deployment, configuration, or service operations.

Video Summarization API And Service Ops Requests

If the user asks to call or debug video summarization endpoints directly, answer from references/video-summarization-api.md instead of running the end-to-end video summarization workflow. Examples: list video summarization models, check readiness, get recommended chunking config, inspect metrics, explain a 422 response, or build a /v1/summarize request body.

If the user asks to configure, deploy, restart, tear down, or troubleshoot the video summarization service, prefer the vss-deploy-profile skill for full VSS profile deployment and use references/video-summarization-deployment.md for video summarization-specific service details.

Routing

Decide purely from video summarization service availability (probed in Setup → Availability checks below). Duration does not drive routing.

/v1/ready Backend Endpoint
HTTP 200 LVS microservice with HITL POST ${LVS_BACKEND_URL}/v1/summarize
Anything else VLM / RT-VLM with the default prompt + fallback note POST ${VLM_BASE_URL}/v1/chat/completions

Fallback message when the LVS service is unreachable — copy verbatim above the summary:

Note: Input video <name> is <N>s long. The video summarization service is not deployed, so this summary was produced by the VLM alone with a generic default prompt. Deploy the lvs profile for higher-quality summaries with scenario/events targeting.

Deployment prerequisite

The VSS lvs profile on $HOST_IP is the primary backend. If the /v1/ready probe (see Setup → Availability checks) returns anything other than 200 after the warmup retries, ask the user:

"The VSS lvs profile isn't running on $HOST_IP. Shall I deploy it now using the /vss-deploy-profile skill with -p lvs? Reply no to summarize with the VLM-only fallback instead (lower quality, no scenario/events targeting)."

  • Yes → hand off to /vss-deploy-profile, then re-probe and continue with Step 2 (LVS + HITL).
  • No → go straight to Step 2 fallback (VLM with default prompt) and prepend the Routing fallback note. Do not ask again, and do not run scenario/events HITL.
  • Pre-authorized to deploy autonomously (caller said so explicitly) → skip the confirmation and invoke /vss-deploy-profile directly.
  • Pre-authorized to use VLM fallback ("skip lvs, just use the VLM") → go straight to Step 2 fallback without prompting.

Setup

Endpoints (defaults for a local VSS lvs deployment):

  • VLM / RT-VLM: ${VLM_BASE_URL} — default ${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}
  • LVS service: ${LVS_BACKEND_URL} — default http://${HOST_IP:-localhost}:38111
  • VIOS: owned by vss-manage-video-io-storage; refer there.

Use env vars when set (strip trailing /v1 from the VLM base — the skill appends it). Otherwise use the defaults. If neither works, ask the user — do not scan ports or read config files to guess.

Model name: read ${VLM_NAME} (default nim_nvidia_cosmos-reason2-8b_hf-1208). It must match the id RT-VLM /v1/models advertises; do not substitute the friendly nvidia/cosmos-reason2-8b.

For endpoint schemas, optional fields, response envelopes, and error handling, see references/video-summarization-api.md.

Availability checks (run both before routing). Readiness is determined by the HTTP status code only — the LVS /v1/ready may legitimately return 200 with an empty body, so do not inspect the body.

VLM="${VLM_BASE_URL:-${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}}"
VLM="${VLM%/v1}"

# VLM / RT-VLM: 200 on /v1/models
vlm_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 --max-time 10 \
  "$VLM/v1/models")
[ "$vlm_code" = "200" ] && echo "VLM OK" || echo "VLM not reachable (HTTP $vlm_code)"

# Video summarization service: 200 on /v1/ready, with retry on 503 (warmup) for up to ~30s
VIDEO_SUMMARIZATION_URL=${LVS_BACKEND_URL:-http://${HOST_IP:-localhost}:38111}
video_sum_code=000
for i in $(seq 1 10); do
  video_sum_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 --max-time 10 "$VIDEO_SUMMARIZATION_URL/v1/ready")
  case "$video_sum_code" in
    200) echo "video summarization OK"; break ;;
    503) sleep 3 ;;                 # warming up; keep polling
    *)   break ;;                   # any other code = not reachable, stop retrying
  esac
done
[ "$video_sum_code" = "200" ] || echo "video summarization service not reachable (HTTP $video_sum_code)"

How to interpret the results:

  • video_sum_code = 200Step 2 (LVS + HITL) for every video.
  • video_sum_code != 200, vlm_code = 200Step 2 fallback (VLM); prepend the Routing fallback note.
  • vlm_code != 200 → fail; at least one backend must be reachable.
  • A non-200 LVS code after the retry loop is the ONLY signal of unavailability. Empty stdout or missing JSON fields are NOT "unavailable."

Step 1 - Get the clip URL via vss-manage-video-io-storage (sub-task, NOT the final answer)

Use the vss-manage-video-io-storage skill for all VIOS interactions — it owns the canonical curl recipes, parameter defaults, and delete/upload flows. Do not fabricate URLs or hand-roll VIOS calls; they will drift.

This step is a sub-task — do NOT end your turn here; do NOT return the clip URL as the final answer. From VIOS collect three values:

  1. streamId (via sensor/listsensor/<id>/streams, or directly from an upload response).
  2. Timeline - {startTime, endTime} (ISO 8601 UTC). endTime - startTime is the duration; needed only for the user-facing header (routing is driven solely by /v1/ready).
  3. Temporary MP4 clip URL — the /storage/file/<streamId>/url variant with container=mp4. Response field: .videoUrl. Both backends need an HTTP(S) URL they can GET.

Everything else (auth, upload, disableAudio, expiry, etc.) lives in the vss-manage-video-io-storage skill — refer users there if VIOS fails.


Step 2 — Primary: video summarization microservice with HITL

Use this path whenever /v1/ready returned 200 in Setup. Duration is irrelevant.

For advanced fields (media_info, schema, structured output, stream captioning, metrics, recommended config) see references/video-summarization-api.md.

HITL: collect scenario and events first (REQUIRED — do not skip)

Full walk-through is in references/hitl-prompts.md. Always run HITL before calling the LVS service.

Autonomous-mode defaults. When the caller has bypassed HITL ("run autonomously without prompting") AND the original query asks for default/defaults (or gives none), use scenario="activity monitoring" and events=["notable activity"] verbatim — do not infer from filename or sensor name. Note the defaults in the final reply and offer a re-run with more specific parameters. This is the ONLY supported HITL bypass; "the video is short" or "the user seems in a hurry" are not valid reasons.

Prefer POST /v1/summarize (3.2 GA route); /summarize is a compatibility alias.

VIDEO_SUMMARIZATION_URL=${LVS_BACKEND_URL:-http://${HOST_IP:-localhost}:38111}

# From HITL reply:
SCENARIO='warehouse monitoring'
EVENTS_JSON='["notable activity"]'
OBJECTS_JSON=''  # '' to omit, else '["forklifts","pallets","workers"]'

curl -s --max-time 300 -X POST "$VIDEO_SUMMARIZATION_URL/v1/summarize" \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg url "<clip_url_from_vss_manage_video_io_storage>" \
        --arg model "${VLM_NAME:-nim_nvidia_cosmos-reason2-8b_hf-1208}" \
        --arg scenario "$SCENARIO" \
        --argjson events "$EVENTS_JSON" \
        --argjson objects "${OBJECTS_JSON:-null}" '{
    url: $url,
    model: $model,
    scenario: $scenario,
    events: $events,
    chunk_duration: 10,
    num_frames_per_second_or_fixed_frames_chunk: 20,
    use_fps_for_chunking: false,
    seed: 1
  } + (if $objects == null then {} else {objects_of_interest: $objects} end)')" \
  | jq -r '.choices[0].message.content' \
  | jq '{video_summary, events}'

If both video_summary and events are empty, the clip probably doesn't contain the requested events — re-run with broader scenario/events, don't report "no content".

Tuning: chunk_duration (default 10s; 0 = single chunk), num_frames_per_second_or_fixed_frames_chunk (default 20; meaning depends on use_fps_for_chunking), seed (default 1). num_frames_per_chunk is deprecated.


Step 2 fallback — VLM direct with default prompt

Use this path only when /v1/ready did not return 200 after warmup. Do NOT run HITL — the user did not opt in; you fell back because the service was missing. Prepend the Routing fallback note to the response.

VLM="${VLM_BASE_URL:-${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}}"
VLM="${VLM%/v1}"
PROMPT='Describe in detail what is happening in this video,
including all visible people, vehicles, equipments, objects,
actions, and environmental conditions.
OUTPUT REQUIREMENTS:
[timestamp-timestamp] Description of what is happening.
EXAMPLE:
[0.0s-4.0s] <description of the first event>
[4.0s-12.0s] <description of the second event>'

curl -s --max-time 300 -X POST "$VLM/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "$(jq -n \
        --arg model "${VLM_NAME:-nim_nvidia_cosmos-reason2-8b_hf-1208}" \
        --arg text "$PROMPT" \
        --arg url "<clip_url_from_vss_manage_video_io_storage>" \
        '{
          model: $model,
          temperature: 0.0,
          max_tokens: 1024,
          messages: [{
            role: "user",
            content: [
              {type: "text", text: $text},
              {type: "video_url", video_url: {url: $url}}
            ]
          }]
        }')" | jq -r '.choices[0].message.content'

Response: standard OpenAI chat-completion envelope. The summary is in choices[0].message.content.

Cosmos-model notes: Cosmos Reason 2 supports reasoning via <think>...</think><answer>...</answer> blocks. Omit the reasoning instructions if you want a plain summary. Frame sampling and pixel limits are applied server-side; no client-side prep is required when you pass a video_url.


End-to-end example

See references/end-to-end-example.md for the full LVS-or-VLM-fallback script that probes /v1/ready and runs the appropriate path.


Responses

  • VLM returns an OpenAI chat-completion envelope; summary is choices[0].message.content.
  • LVS service returns the same envelope but content is a JSON string — run jq -r '.choices[0].message.content' | jq to reach {video_summary, events}.
  • Errors surface as HTTP non-2xx plus JSON {error: ...}. LVS 503 usually means warmup — retry /v1/ready.

Presenting the output to the user

Surface backend output with minimal transformation — do not paraphrase, re-voice, add emojis, or reformat. One backend call → one rendering: no parallel hedging, no duplicate headers, never call both LVS and VLM for the same video.

Header line. Start with exactly one:

Summary of <video_name> (<duration>)

<duration> = Ns for < 60 s, else Mm Ss (e.g. 3m 30s).

LVS output: render video_summary verbatim (polished, tone-controlled report — rewriting loses fidelity). Render each events entry with its start_time, end_time, type, and full description verbatim (table when the client renders one cleanly, otherwise a per-event list). You MAY add a one-line header and a closing offer to re-run with different parameters.

VLM output: render choices[0].message.content verbatim. If the model produced <think>…</think><answer>…</answer> blocks, drop the <think> block and show the answer.

Fallback warning (when applicable) goes above the summary, never mixed into it.

Tips

  • Route by service availability, not by duration. Probe /v1/ready once in Setup; HTTP 200 → LVS+HITL for every clip; anything else → VLM fallback.
  • HITL is mandatory on the LVS path. The defaults opt-in is the only sanctioned bypass. The VLM fallback path is silent (no HITL).
  • Readiness = HTTP 200 on /v1/ready. Nothing else. Body may be empty. Always use curl -s -o /dev/null -w '%{http_code}' — never pipe through jq/grep/head.
  • Delegate VIOS to vss-manage-video-io-storage — it is a sub-task; the final answer is the Step 2 summary, not the clip URL.
  • jq twice for LVS output. First unwraps the OpenAI envelope, second parses the JSON string inside content.
  • Prefer /v1/summarize for 3.2 GA; /summarize is a compatibility alias.
  • Use the exact VLM model id advertised by the endpoint (default nim_nvidia_cosmos-reason2-8b_hf-1208).
  • Render output verbatim — no paraphrasing, no reformatting, no rewriting the video_summary or choices[0].message.content.
  • One call, one render. No parallel hedging, no double renderings.

Cross-reference

  • vss-deploy-profile — bring up the base (VLM only) or lvs (VLM + video summarization service) profile
  • vss-manage-video-io-storage (VIOS API) — upload videos, list streams, get clip URLs
  • vss-search-archive — semantic search across the archive (different profile)
  • vss-query-analytics — query incidents/events from Elasticsearch
  • video summarization API referencereferences/video-summarization-api.md
  • video summarization service ops referencereferences/video-summarization-deployment.md

bump:2

Install via CLI
npx skills add https://github.com/NVIDIA/skills --skill vss-summarize-video
Repository Details
star Stars 1,155
call_split Forks 142
navigation Branch main
article Path SKILL.md
More from Creator