minutes-video-review - SKILL.md Agent Skill

name: minutes-video-review description: Analyze a product walkthrough, bug report video, Loom, or ScreenPal using Minutes transcription plus visual review. Use when the user wants a recorded demo or bug clip turned into a durable brief with transcript, key frames, issues, and next steps. compatibility: opencode

Skill Path

Before running helper scripts or opening bundled references, set:

export MINUTES_SKILLS_ROOT="$(git rev-parse --show-toplevel)/.opencode/skills"
export MINUTES_SKILL_ROOT="$MINUTES_SKILLS_ROOT/minutes-video-review"

/minutes-video-review

Analyze a product walkthrough, bug report video, Loom, ScreenPal, or local recording into a durable artifact bundle that agents can keep working from.

This skill is for meeting-adjacent product artifacts, not for generic "understand any video" requests. Use it when the user wants a recorded demo, bug repro, or walkthrough turned into something actionable for engineering, product, support, or follow-up agent work.

What this skill does

The bundled script handles the deterministic pipeline:

resolve a local file or hosted video URL
download hosted video when needed
extract audio with ffmpeg
transcribe with Minutes first, using the user's existing Minutes transcription setup
sample key frames with adaptive caps so long videos do not blow up context
write a durable artifact bundle under ~/.minutes/video-reviews/

Then you review the resulting artifacts and return the actual user-facing brief.

Primary command

Local file:

python3 "$MINUTES_SKILL_ROOT/scripts/video_review.py" \
  "/absolute/path/to/video.mp4"

Hosted video:

python3 "$MINUTES_SKILL_ROOT/scripts/video_review.py" \
  "https://go.screenpal.com/watch/..."

Useful options:

python3 "$MINUTES_SKILL_ROOT/scripts/video_review.py" \
  "https://www.loom.com/share/..." \
  --focus "customer signup bug repro" \
  --cookies-from-browser chrome \
  --env-file /absolute/path/to/.env \
  --frame-step 15 \
  --max-frames 36 \
  --keep-temp

How to use it

Phase 1: Run the pipeline

Run the script on the provided local file or hosted video URL.

The script prints JSON with the output artifact paths. Important outputs include:

analysis_md
analysis_json
transcript_md
metadata_json
frames_dir
contact_sheet_artifact

Phase 2: Inspect the artifacts

Read the generated analysis.md and analysis.json first.

Then inspect:

transcript.md for the actual spoken content
selected images from frames/ when visual state matters
contact-sheet.jpg for a quick visual sweep across sampled frames
metadata.json for transcript method, duration, source kind, and frame sampling details

Phase 3: Produce the real brief

Return a concise, useful brief to the user that includes:

what the video is trying to show
likely bug / proposal / walkthrough intent
key moments or timestamps
likely impacted area or flow
the clearest next actions

Do not just echo the generated markdown blindly. Use the artifacts as evidence and produce a thoughtful agent answer.

Minutes-first transcription rules

This skill should prefer transcript backends in this order:

hosted captions / VTT when the source exposes them
minutes process with an isolated temporary config
local whisper CLI if available
OpenAI audio transcription only as a last resort when configured

Important:

the Minutes path should use the user's current Minutes transcription setup
if Minutes is configured for Whisper, use Whisper
if Minutes is configured for Parakeet, use Parakeet
do not silently fork a separate transcription stack unless the Minutes path is unavailable

When reporting the artifacts back to the user, preserve the transcript method exactly. Prefer labels like:

vtt_captions
minutes-whisper
minutes-parakeet
minutes-whisper-fallback
local_whisper_cli
openai_audio_transcription

Context discipline

This skill must stay disciplined about context size.

Do not send the full video itself to the reasoning layer.
Do not dump a long transcript and dozens of frames into the final answer.
Treat the transcript as the backbone and frames as supporting evidence.
Prefer inspecting a curated subset of frames instead of every sampled image.

The bundled script already caps frames adaptively, but you should still exercise judgment when deciding what to read or mention.

Output contract

The script writes a durable bundle under:

~/.minutes/video-reviews/<timestamp>-<slug>/

Expected files:

analysis.md
analysis.json
transcript.md
metadata.json
frames/

These artifacts are not part of the normal ~/meetings/ corpus by default.

Dependencies

See:

$MINUTES_SKILL_ROOT/references/dependencies.md
$MINUTES_SKILL_ROOT/references/output-schema.md

Gotchas

Hosted URLs need yt-dlp. Local file review still works without it.
Frame caps are intentional. The script samples enough evidence to review the video without turning this into a generic video-intelligence pipeline.
Minutes artifacts stay isolated. The script uses a temp config/output path for the Minutes transcription run so it does not pollute the user's normal archive.
Model-powered auto-analysis is optional. The generated analysis.md/json may be heuristic when no multimodal provider key is available. You still need to read the artifacts and produce the final answer.
Long videos need synthesis, not brute force. If the transcript is long, work from the generated artifacts and only open the most relevant frames and transcript sections.