dlstreamer-coding-agent - SKILL.md Agent Skill

name: dlstreamer-coding-agent description: "Build new DL Streamer video-analytics applications (Python, C, C++ or GStreamer command line). Use when: user describes a vision AI pipeline, wants to create a new sample app, combine elements from existing samples, add detection/classification/VLM/tracking/alerts/recording to a video pipeline, or create custom GStreamer elements in Python or C++. Translates natural-language pipeline descriptions into working DL Streamer code using established design patterns." permissions: - write - command

DL Streamer Coding Agent

Build new DL Streamer video-analytics applications (Python, C, C++ or GStreamer command line) by composing design patterns extracted from existing sample apps.

File Resolution

This skill uses repo-root-relative paths to reference files outside the skill folder (e.g. docs/user-guide/elements/, samples/gstreamer/python/hello_dlstreamer/). The repo root is three directories above this skill file when the full repo is cloned, or refers to https://github.com/open-edge-platform/dlstreamer if only skill files were copied.

When to Use

User describes a vision AI pipeline in natural language
User wants to create a new Python sample application built on DL Streamer
User wants to create a new C or C++ sample application built on DL Streamer
User wants to create a new GStreamer command line using DL Streamer elements
User wants to combine elements from multiple existing samples (e.g. detection + VLM + recording)
User needs to add custom analytics logic or custom GStreamer elements in Python or C++

See example prompts for inspiration.

Directory Layout for a New Sample App

<new_sample_app_name>
├── <app_name>.py or .sh        # Main application (Python or shell script)
├── export_models.py or .sh     # Model download and export script
├── requirements.txt            # Python dependencies for the application
├── export_requirements.txt     # Python dependencies for model export scripts
├── README.md                   # Setup and usage instructions
├── plugins/                    # Only if custom GStreamer elements are needed
│   ├── python/
│   │   └── <element>.py
│   └── c/
│       └── <element>.c
├── config/                     # Only if config files are needed
│   └── *.txt / *.json
├── models/                     # Created at runtime (cached model exports)
├── videos/                     # Created at runtime (cached video downloads)
└── results/                    # Created at runtime (output files)

Procedure

Execution Overview

After Step 0 (requirements gathering), kick off all independent long-running tasks in parallel via async terminals, then continue with reasoning-heavy work while they complete. When in doubt about ordering, always wait for a step's listed prerequisites to finish before starting it — the dependency graph below is the single source of truth.

Step 0 (gather requirements — interactive)
  │
  ├──► Step 1  (Docker pull — async) ───────────────────────────────────────┐
  ├──► Step 2a (export scripts + pip install — async) ──► Step 2c (export)──┤
  ├──► Step 2b (video download — async) ────────────────────────────────────┤───► Step 5 (run & validate)
  └──► Step 3  (design pipeline — reasoning) ──► Step 4 (generate app) ─────┘

Parallelization rules:

Steps 1, 2a, 2b, and 3 are fully independent — start them all immediately after Step 0
Step 2c (model export) depends on Step 2a (pip install) completing
Step 4 (generate app) depends on Step 3 (pipeline design) completing
Step 5 (run and validate) depends on Steps 1, 2c, and 4 all completing

Safety rules for autonomous execution:

Before running any command that installs packages, downloads external content, or modifies/deletes files, show the exact command and request explicit user confirmation in chat.
Never interpolate raw user input into shell commands. Use validated allowlists and fixed argument templates.
Restrict file operations to the sample application directory unless the user explicitly approves a wider scope.

Reference Lookup

Each reference document is used in one primary step to avoid redundant reads:

Reference	Primary Step	Purpose
Requirements Questionnaire	Step 0	Detailed questions to ask when user prompt is incomplete
Model Preparation	Step 2	Prepare AI models in OpenVINO IR format
Pipeline Construction	Step 3	Element selection, pipeline rules, common patterns
Sample Index	Step 3	Existing samples to study before generating code
Design Patterns	Step 3	Python application structure, patterns, and coding conventions
Debugging Hints	Step 5	Docker testing, common gotchas, validation checklist

Fast Path (Pattern Table Match)

Before proceeding with the full procedure, check if the user's prompt maps directly to a row in the Common Pipeline Patterns table. If a match is found:

Pre-fill Step 0 fields from the matched row
If any required field is missing or inferred from the matched row, present the pre-filled values to the user for confirmation (skip the full Requirements Questionnaire unless info is still missing)
If all required fields were explicitly provided by the user (not inferred), skip requirement-field confirmation, but still request explicit user approval before running any command in Steps 1–2
After the user confirms (or overrides), read only the design patterns, reference sections, and model-preparation sections needed for the confirmed selections
Proceed to Steps 1–5

Step 0 — Gather Requirements

Extract the following from the user's prompt:

Required info	Look for	Default if missing
Video input	File path, HTTP URL (for download), or RTSP URI	— (must ask)
AI model(s)	Model name/URL and task (detection, classification, VLM, OCR, …)	— (must ask)
Target hardware	Intel platform, available accelerators (GPU/NPU/CPU)	`Not sure / detect at runtime`
Output format	Annotated video, JSON, JPEG snapshots, display window	`All of the above`
Application type	Python app, C/C++ app, or GStreamer command line	When the prompt references an existing application to convert, determine the application type by inspecting the source application's file extensions. Application type must match the programming language of the input application (C/C++ → C/C++, Python → Python, shell → GStreamer command line)
Docker image	DL Streamer Docker tag	`intel/dlstreamer:latest` (this tag is treated as the latest Ubuntu 24 image)

Application type override: If the user's prompt contains explicit language like

"bash script", "shell script", "gst-launch", or "command line", set Application type to GStreamer command line regardless of the default. Only default to Python application when the prompt does not indicate a preference and there is no source application to convert.

If the user's prompt explicitly provides all required info (video input AND model names are explicitly stated, not inferred), proceed directly to Step 1.

If any required info is missing or was inferred via Fast Path (not explicitly stated by the user), you MUST present the pre-filled values and ask the user to confirm or override before proceeding. Use the interactive question tool if available (e.g. vscode_askQuestions in VS Code Copilot), otherwise list the values inline in chat. Do NOT silently assume defaults and skip confirmation.

If the user requests NPU but the selected model or elements do not support NPU inference, inform the user and suggest falling back to GPU or CPU.

Step 1 — Pull Docker Image (async)

Start the Docker image pull in an async terminal immediately after Step 0 completes.

Always pull the latest available image Do NOT reuse a locally cached image without pulling first.

docker pull intel/dlstreamer:latest

If docker pull fails (for example, image not found or network error), inform the user and suggest checking Docker login and network connectivity before retrying.

Step 2 — Prepare Models and Video (async)

2a — Create export scripts and kick off venv + pip install

Check whether the requested models (or similar ones) appear in the model exporters bundled with DL Streamer.

Model exporter	Typical Models	Path
download_public_models.sh	Traditional computer vision models	`samples/download_public_models.sh`
download_hf_models.py	HuggingFace models, including VLM models and Transformer-based detection/classification models (RTDETR, CLIP, ViT)	`scripts/download_models/download_hf_models.py`
download_ultralytics_models.py	Specialized model downloader for Ultralytics YOLO models	`scripts/download_models/download_ultralytics_models.py`

If a model is found, extract its download recipe and create a local export_models.py in the application directory. If a model is not listed, check the Model Preparation Reference for export instructions, then write a new script using the Export Models Template.

Create the export_requirements.txt file using the Export Requirements Template if the model export script requires additional Python packages (e.g. HuggingFace transformers, Ultralytics, optimum-cli, etc.). Add comments in export_requirements.txt to indicate which model export script requires a specific package. Use exact pinned versions from the Model Preparation Reference → Requirements.

CRITICAL — CPU-only PyTorch: The first line of export_requirements.txt must be --extra-index-url https://download.pytorch.org/whl/cpu (before any torch-dependent package like ultralytics or nncf). Without this, pip pulls multi-GB GPU libraries not needed for model export. See Model Preparation Reference → Requirements for the full template.

Once both files are written, start venv creation and pip install in an async terminal:

# Run in async mode — do NOT wait for completion
python3 -m venv .<app_name>-export-venv && \
source .<app_name>-export-venv/bin/activate && \
pip install -r export_requirements.txt

2b — Download video to local directory

If the user provided an HTTP URL for video input, download it now:

mkdir -p videos && curl -L -o videos/<video_name>.mp4 \
    -H "Referer: https://www.pexels.com/" \
    -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36" \
    "<DIRECT_VIDEO_URL>"

The application itself should not download videos — it accepts only --input pointing to a local file or RTSP URI. Document download steps in the README.

Pexels page URLs → direct file URLs: A Pexels page URL (https://www.pexels.com/video/<slug>-<ID>/) is not a direct download link. Scrape the page with curl -s and search the HTML for videos.pexels.com/video-files/ links to get the actual .mp4 URL. Do not guess resolution or FPS — they vary per video. If scraping fails, ask the user for the direct URL.

Git LFS warning: Videos from edge-ai-resources may return HTML instead of video data. Verify: file videos/sample.mp4 | grep -q "ISO Media". Prefer Pexels direct URLs as default test videos.

Proceed to Step 3 while pip install and docker pull run in the background.

2c — Run model export (after pip install completes)

Before running the export, confirm the async terminal from Step 2a has completed successfully. If the install failed, diagnose and re-run before continuing.

Once confirmed, run the model export:

source .<app_name>-export-venv/bin/activate
python3 export_models.py  # or bash export_models.sh

If model export fails, check command output for common causes (unsupported architecture, insufficient RAM, missing model weights), report the error with a suggested fix, then retry.

Step 3 — Design Pipeline

Design a DL Streamer pipeline that fulfills the user's requirements. This step covers element selection and application structure.

3a — Select elements and assemble pipeline string

Use the Pipeline Construction Reference to identify elements for each pipeline stage (source, decode, inference, metadata, sink). Follow the Pipeline Design Rules in that reference.

For common use cases, go straight to file generation using the use-case → template/pattern mapping table.

For complex cases, consult the Sample Index for relevant reference implementations, then read the specific samples that match the user's use case.

Converting from DeepStream

When converting a DeepStream application, follow these additional rules:

Inventory the source pipeline. Identify all elements in the DeepStream pipeline first.
Map each element 1-to-1 using the Converting Guide at docs/user-guide/dev_guide/converting_deepstream_to_dlstreamer.md.
Connect DL Streamer elements using the Common Pipeline Patterns table or Sample Index.
Do not add elements absent from the source pipeline. Every element in the converted pipeline must trace back to the inventory.

3b — Choose application structure

For a CLI application, the pipeline string from 3a is the deliverable — wrap it in a gst-launch-1.0 shell script.

For a Python application, map the user's description to one or more design patterns using the Pattern Selection Table:

Select the pipeline construction approach — see Pattern 1: Pipeline Core
Add callbacks/probes as needed
Add custom Python elements if the user needs inline analytics — check first whether existing GStreamer elements can handle the logic. If not, follow the Conventions under Pattern 7.
Wire up argument parsing
Add the pipeline event loop — see Pattern 2: Pipeline Event Loop

Step 4 — Generate Application

Generate all application files following the directory layout defined at the beginning of this document.

Language-specific generation:

C/C++ applications:: Use the Application Template as the starting skeleton. Read the Design Patterns Reference for coding conventions and application structure.

Python applications: Use the Application Template as the starting skeleton. Read the Design Patterns Reference for coding conventions and application structure.

For all languages:

Use the README Template to generate README.md by replacing all {{PLACEHOLDERS}} as described below:

Placeholder	What to generate
`{{APP_TITLE}}`	Short title of the application
`{{APP_DESCRIPTION}}`	2–3 sentences describing what the application does and its main use case
`{{DLSTREAMER_CODING_AGENT_PROMPT}}`	The verbatim initial user prompt wrapped in a Markdown blockquote (`>` ). Do not paraphrase or summarize.
`{{APP_VISUALIZATION}}`	Optional screenshot line: `![APP_TITLE](results/screenshot.png)`. Omit this line entirely if no screenshot is available.
`{{DETAILED_DESCRIPTION}}`	Extended description: model names, hardware requirements, expected outputs. If the input video is from a publicly available source (e.g. Pexels), add: `This sample uses a video from <link> by <author>.`
`{{NUMBERED_STEPS}}`	Numbered list of pipeline stages, e.g. `1. Detects objects using gvadetect`
`{{PIPELINE_DIAGRAM}}`	Mermaid diagram. Use `graph LR` for linear pipelines; use subgraphs for tee/multi-branch (see `smart_nvr` and `vlm_self_checkout` for examples).
`{{PIPELINE_ELEMENTS_LIST}}`	Optional bulleted list of each GStreamer/DL Streamer element and its role. Omit if the pipeline is straightforward.
`{{VIDEO_DOWNLOAD_INSTRUCTIONS}}`	`curl` command to download the test video into `videos/`. If no public video is used, omit the enclosing `### Download Video` heading and this placeholder entirely.
`{{ADVANCED_USAGE}}`	Optional second usage block showing non-default CLI options. Omit if not needed.
`{{HOW_IT_WORKS_SECTIONS}}`	One `### STEP N` subsection per major pipeline stage or custom element, with relevant code snippets.
`{{CONFIGURATION_FILES_SECTION}}`	Optional `## Configuration Files` table (file name + purpose). Omit the section if unused.
`{{CLI_ARGUMENTS_TABLE}}`	One table row per CLI argument: flag name, default value, description.
`{{OUTPUT_FILES_LIST}}`	Bulleted list of output files produced under `results/`.

If the application requires Python packages, list them in requirements.txt. If the OpenVINO Python runtime is required, pin the same version as the OpenVINO runtime installed with DL Streamer.

Step 5 — Run, Debug, and Validate

Run in Docker

docker run --init --rm \
    -u "$(id -u):$(id -g)" \
    -e PYTHONUNBUFFERED=1 \
    -v "$(pwd)":/app -w /app \
    --device /dev/dri \
    --group-add $(stat -c "%g" /dev/dri/render*) \
    --device /dev/accel \
    --group-add $(stat -c "%g" /dev/accel/accel*) \
    intel/dlstreamer:latest \
    python3 <app_name>.py

Autonomous execution — never wait for user confirmation. Launch in async mode, poll get_terminal_output every 15–30s until completion. Only ask the user when a decision is needed (e.g. device change after OOM). This applies to all long-running commands: docker run, docker pull, pip install, model export.

Validate: check that output matches the user's expected results. Use the Debugging Hints and Validation Checklist for common gotchas. For continuous or long inputs, send EOS to finalize.

Post-Task — Report Coding Agent Analytics

Report the analytics table immediately after the application is validated, in the same response as the validation results.

After the application is working, report timing metrics:

AI reasoning time — understanding the prompt, designing the pipeline, writing code
Environment setup time — waiting for pip install, model export, Docker image pull
Debug and validation time — running the application, checking outputs, fixing issues
User wait time — waiting for user input or confirmation
Total activity time (phases may overlap, so total ≠ sum of individual phases)

Examples

See example prompts for inspiration and practical demonstrations of the procedure.