name: vss-deploy-dense-captioning description: Use this skill when deploying standalone RT-VLM dense captioning or calling its REST API (uploads, captions, streams, chat-completions, Kafka). Not for VSS profile deploy or video-search ingestion. license: Apache-2.0 metadata: version: "3.2.0" github-url: "https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization" tags: "nvidia blueprint operational deployment"
Purpose
Stand up the RT-VLM dense-captioning microservice on its own and exercise every endpoint it exposes (file upload, generate_captions, stream add/delete, chat-completions, Kafka topics).
Prerequisites
For standalone RT-VLM deployment:
- Docker, Docker Compose, NVIDIA Container Toolkit, and a visible GPU.
- NGC registry credentials in
$NGC_CLI_API_KEYfordocker login nvcr.io, image pulls, and local NGC model/artifact downloads. curl,jq, and any writable working directory for the standalone compose copy.
For API calls against an existing service:
- Running RT-VLM service reachable at
$BASE_URL. - Bearer token in
$RTVI_VLM_API_KEYor$NGC_CLI_API_KEY, depending on how the service was configured.
For full VSS profile deployment:
- Use
../vss-deploy-profile/SKILL.md; this skill does not deploy full VSS profiles.
Instructions
Follow the routing tables and step-by-step workflows below. Each section that ends in workflow, quick start, or flow is intended to be executed top-to-bottom. Detailed reference material lives in references/; execute the documented workflows directly unless a future revision names a concrete helper.
Examples
Worked end-to-end examples are kept under evals/ (each *.json manifest contains a runnable scenario) and inline in the per-workflow curl blocks below. Run a Tier-3 evaluation with nv-base validate <this-skill-dir> --agent-eval to replay them.
Limitations
- Requires either a standalone RT-VLM service deployed via this skill or an existing RT-VLM service reachable from the caller.
- NGC-hosted models and NIMs may be subject to rate-limits, GPU memory requirements, and license restrictions.
- Concurrency, GPU memory, and storage limits depend on the host hardware and the profile's compose file.
- Keep
NGC_CLI_API_KEY,RTVI_VLM_API_KEY, and.envfiles out of git and out of logs; do not echo credential values or include them in final responses. - Docker group access and
sudoare effectively root-level privileges. Use the non-interactivesudo -nguard in the deploy reference and stop for host-owner action when passwordless sudo is unavailable.
Troubleshooting
- Error: REST call returns connection refused. Cause: target microservice not running. Solution: probe
/docsor/health; redeploy viavss-deploy-profileor the matchingvss-deploy-*skill. - Error: HTTP 401/403 from NGC pulls. Cause: missing/expired
NGC_CLI_API_KEY. Solution:docker login nvcr.ioand re-export the key before retrying. - Error: container OOM or model fails to load. Cause: insufficient GPU memory for the selected profile. Solution: switch to a smaller variant or free GPUs via
docker compose down.
Deploy and Use RT-VLM Dense Captioning (VSS 3.2)
RT-VLM is NVIDIA's real-time vision-language microservice: decode video (file or
RTSP), segment it into chunks, run a VLM (cosmos-reason1, cosmos-reason2, or any
OpenAI-compatible model), stream dense captions back over SSE/HTTP, and publish
captions, incident alerts, and errors to Kafka. Use this skill to deploy the
standalone RT-VLM service when a full VSS profile is not already running, then call
its /v1/... API for caption generation, file upload, live-stream management, health
checks, NIM-compatible chat completions, or Prometheus metrics. API reference:
https://docs.nvidia.com/vss/latest/real-time-vlm-api.html.
Deployment Routing
If the user asks to deploy a full VSS profile, use
../vss-deploy-profile/SKILL.md. That skill
owns profile routing, generated.env, resolved.yml, multi-service sizing, and
full-stack deploy/teardown.
If the user asks for standalone RT-VLM dense captioning, or no VSS profile is
already running, use the standalone RT-VLM flow in
references/deploy-rt-vlm-service.md
before calling the API. This follows the same compose-centric pattern as
vss-deploy-profile: gather context, run preflights, work from a local copy,
dry-run with docker compose config, review, deploy, then wait for health.
Standalone Deployment Flow
Always follow this sequence. Never skip the dry-run.
# 1. Copy deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.yml
# into any writable standalone working directory.
# 2. Derive RTVI_VLM_IMAGE_TAG from that compose copy.
# 3. Strip the standalone-only dangling depends_on block from the copy.
# 4. Create a gitignored .env with the required RT-VLM values.
# 5. Prepare host bind paths such as $VSS_DATA_DIR/data_log/vst/clip_storage.
# Use `sudo -n` for ownership fixes; if passwordless sudo is unavailable,
# stop and ask the host owner to run the printed command manually.
# 6. docker compose --env-file .env -f rtvi-vlm-docker-compose.yml config --quiet
# 7. docker pull the exact RT-VLM image tag.
# 8. docker compose ... up -d rtvi-vlm, wait for ready, then smoke test.
Run preflights before any pull or up; stop and fix failures here before
debugging RT-VLM itself:
nvidia-smi --query-gpu=index,name --format=csv,noheader
nvidia-container-cli info
docker compose version
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
For standalone single-file deployments, do not run the raw
deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.yml directly: it
contains depends_on references to sibling VLM/NIM services that are only
defined in the full VSS/met-blueprints compose project. The standalone reference
shows how to copy the compose file, derive the current image tag from it, strip
the depends_on block, and validate the result before up.
For agent-driven validation, never let sudo prompt interactively. Before any
privileged ownership or Docker operation, use the non-interactive guard in
references/deploy-rt-vlm-service.md:
prefer plain docker; otherwise use sudo -n docker; if sudo -n fails, stop
with the exact manual command for the host owner instead of retrying with
interactive sudo or weakening permissions.
If docker pull fails with a containerd snapshotter/unpack error on Docker 28+,
apply the /etc/docker/daemon.json containerd-snapshotter=false fix in the
standalone reference before retrying.
Minimum standalone .env values:
| Host env var | Required when | Purpose |
|---|---|---|
NGC_CLI_API_KEY |
Standalone deploy path | NGC registry image pull and NGC model/artifact download |
RTVI_VLM_API_KEY or NGC_CLI_API_KEY |
Authenticated API calls | RT-VLM bearer auth after the service is running |
RTVI_VLM_PORT |
Always | Host API port mapped to container 8000 |
HOST_IP |
Always | Kafka bootstrap host (${HOST_IP}:9092) |
VSS_DATA_DIR |
Always | Required clip-storage bind mount |
RTVI_VLM_MODEL_TO_USE |
Always for standalone | Backend selector; use cosmos-reason2 for the default local model or openai-compat for a remote/sibling endpoint |
RTVI_VLM_MODEL_PATH |
Local self-hosted model | Source-backed Cosmos Reason 2 path: ngc:nim/nvidia/cosmos-reason2-8b:hf-1208 |
RTVI_VLM_ENDPOINT |
RTVI_VLM_MODEL_TO_USE=openai-compat |
Remote/sibling OpenAI-compatible VLM endpoint |
VLM_NAME |
RTVI_VLM_MODEL_TO_USE=openai-compat |
Model/deployment name exposed by that endpoint |
Setup
export BASE_URL="http://localhost:${RTVI_VLM_PORT:-8018}" # host-side RT-VLM port
export API_KEY="${NGC_CLI_API_KEY:-${RTVI_VLM_API_KEY:-}}" # bearer token used by host-side curl commands
: "${API_KEY:?Set NGC_CLI_API_KEY or RTVI_VLM_API_KEY before calling authenticated endpoints}"
Every request below uses Authorization: Bearer $API_KEY. Health endpoints
(/v1/health/*, /v1/ready, /v1/live, /v1/startup) typically work without auth.
Smoke test before use:
curl -fsS "$BASE_URL/v1/health/ready"
MODEL_ID="$(curl -fsS "$BASE_URL/v1/models" -H "Authorization: Bearer $API_KEY" | jq -r '.data[0].id // .id')"
curl -fsS "$BASE_URL/openapi.json" | jq -r '.paths | keys[]' | sort
RTSP Sample Stream Guard
When a task or eval names RTSP_SAMPLE_URL, treat that exact environment
variable as a required input. Verify it is set and non-empty before probing or
registering any stream; if it is missing, stop with a clear failure message. Do
not derive a substitute from NvStreamer, VIOS, sample-data bundles, or any other
fallback, because that validates a different stream than the caller requested.
: "${RTSP_SAMPLE_URL:?Set RTSP_SAMPLE_URL to a reachable RTSP sample stream before RTSP validation}"
case "$RTSP_SAMPLE_URL" in
rtsp://*) ;;
*) echo "RTSP_SAMPLE_URL must be an rtsp:// URL, got: $RTSP_SAMPLE_URL" >&2; exit 1 ;;
esac
if command -v ffprobe >/dev/null 2>&1; then
ffprobe -v error -rtsp_transport tcp \
-select_streams v:0 -show_entries stream=codec_type \
-of csv=p=0 "$RTSP_SAMPLE_URL" | grep -qx video
elif command -v gst-discoverer-1.0 >/dev/null 2>&1; then
gst-discoverer-1.0 "$RTSP_SAMPLE_URL" | grep -qi 'video'
else
echo "Install ffprobe or gst-discoverer-1.0 before RTSP validation." >&2
exit 1
fi
Quick Start — dense captions from a local video
# 1. Upload the video, capture its file id
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files" \
-H "Authorization: Bearer $API_KEY" \
-F "file=@/path/to/warehouse.mp4" \
-F "purpose=vision" \
-F "media_type=video" | jq -r '.id')
# 2. Generate captions + alerts (SSE stream of chunked responses)
curl -N -X POST "$BASE_URL/v1/generate_captions" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"id\": \"$FILE_ID\",
\"prompt\": \"Write a concise dense caption for each 10-second segment of this warehouse video.\",
\"model\": \"$MODEL_ID\",
\"chunk_duration\": 10,
\"stream\": true
}"
API Surface
Use the live OpenAPI as the source of truth before calling optional endpoints:
curl -fsS "$BASE_URL/openapi.json" | jq -r '.paths | keys[]' | sort
Core paths for VSS 3.2 are:
POST /v1/filesfor multipart media upload; pass the returned fileidinto caption generation and delete the file when finished.POST /v1/generate_captionsfor file or stream captioning. Use the exact model id returned byGET /v1/models; aliases such ascosmos-reason2are backend selectors, not request model ids.POST /v1/streams/add,GET /v1/streams/get-stream-info, andDELETE /v1/streams/delete/{stream_id}for RTSP lifecycle. Parse stream ids fromresults[0].id.POST /v1/chat/completionsfor OpenAI-compatible text and multimodal calls. Current 26.05 builds return HTTP 400 for text-only/v1/completions; treat that as expected when validating legacy behavior.GET /v1/health/ready,/v1/models,/v1/assets/stats, and/v1/metricsfor service probes. Do not assume/v1/licenseexists unless OpenAPI lists it.
Detailed endpoint schemas, response shapes, CV-style singular stream endpoints,
and 26.05 compatibility notes live in
references/api-surface-26.05.md.
Common Workflows
- Stored file captioning: upload with
POST /v1/files, call/v1/generate_captionswith the returned file id, usestream=truefor SSE, then delete the file to release storage. - RTSP live captioning: when the caller provides
RTSP_SAMPLE_URL, use that exact URL and run the RTSP Sample Stream Guard before registration. Do not derive a replacement stream from NvStreamer or VIOS whenRTSP_SAMPLE_URLis empty; fail fast instead. Require an actual video stream/caps entry before registration; add the stream, caption it, then unregister it. - Alert prompts: include a deterministic
Anomaly Detected: Yes/Noline. Kafka publication is server-side config, additive to HTTP responses, and documented inreferences/kafka-workflows.md. - Kafka validation: trust the live
vss-rtvi-vlmenvironment for topic names. In a full VSS alerts real-time profile, use the existing VSS Kafka containermdx-kafkafor CLI checks and final incident-consumer commands. For standalone validation, use a broker that advertises${HOST_IP}:9092; never stop or replace a pre-existing broker without user confirmation.
Error Reference
Common causes: 400 for invalid request shape or model id, 401/403 for missing
or wrong bearer token, 404 for deleted files/streams or unsupported endpoints,
413 for oversized uploads, 422 for schema validation, 429 for too much
concurrency, 500 for inference/runtime failures, and 503 while startup is still
in progress. Inspect docker logs vss-rtvi-vlm for service-side failures.