evaluate-video-quality

star 3.7k

Evaluate generated video quality using available metrics (SSIM, loss trajectory, caption consistency)

hao-ai-lab By hao-ai-lab schedule Updated 3/9/2026

name: evaluate-video-quality description: Evaluate generated video quality using available metrics (SSIM, loss trajectory, caption consistency)

Evaluate Video Quality

Purpose

Assess the quality of videos generated by a training run. Combines multiple signals to give a holistic quality assessment. This skill is evolving — new metrics will be added as they are developed.

Prerequisites

  • Generated videos available locally or via W&B artifacts.
  • For SSIM: reference videos from official implementations.
  • For caption consistency: LLM access (optional, stub for now).

Inputs

Parameter Required Description
video_paths Yes List of paths to generated videos
reference_paths No Paths to reference videos (for SSIM)
prompts No Prompts used to generate videos (for caption check)
loss_summary No Path to W&B summary JSON (for loss trajectory)
metrics No Which metrics to run (default: all available)

Available Metrics

Check .agents/memory/evaluation-registry/README.md for the current catalog.

SSIM (Active)

Leverages the existing infrastructure in fastvideo/tests/ssim/.

pytest fastvideo/tests/ssim/ -vs --video-path <generated> --reference-path <reference>

Or use the SSIM utility directly:

from fastvideo.tests.ssim.ssim_utils import compute_ssim
score = compute_ssim(generated_video, reference_video)
# score > 0.85 is typically "acceptable"

Interpretation:

SSIM Range Quality
> 0.90 Excellent — very close to reference
0.80–0.90 Good — acceptable for most uses
0.70–0.80 Fair — noticeable differences
< 0.70 Poor — significant quality issues

Loss Trajectory (Active)

Analyze the loss curve shape from W&B summary:

import json
with open(loss_summary_path) as f:
    summary = json.load(f)

final_loss = summary["train_loss"]
runtime = summary["_runtime"]
steps = summary["_step"]

Early-stage heuristics (first 500 steps):

  • Loss should be decreasing (even slightly).
  • Grad norm should be stable (no wild oscillations).
  • If loss is flat or increasing, flag for review.

Caption Consistency (Draft — Not Yet Calibrated)

Use an LLM to evaluate whether the video content matches the input prompt.

Prompt: "A golden retriever playing in the snow"
Video: <path>

Score the video on:
1. Object presence (is there a golden retriever?)
2. Action accuracy (is it playing?)
3. Environment match (is there snow?)
4. Overall coherence (does it look natural?)

Each 1-5, total /20.

⚠️ This metric is in draft status. Results should not be treated as ground truth until calibrated against human judgments.

Steps

  1. Identify available metrics — Check .agents/memory/evaluation-registry/README.md.
  2. Run each metric — Collect scores.
  3. Aggregate — Produce a combined quality report.
  4. Log — Update the experiment journal with quality results.

Outputs

## Video Quality Report: <experiment_name>

| Metric | Score | Threshold | Status |
|--------|-------|-----------|--------|
| SSIM (avg) | 0.87 | > 0.80 | ✅ Pass |
| Loss trajectory | decreasing | decreasing | ✅ Pass |
| Caption consistency | 16/20 | > 14/20 | ✅ Pass |

### Per-Video Scores
| Video | SSIM | Caption |
|-------|------|---------|
| video_001.mp4 | 0.89 | 17/20 |
| video_002.mp4 | 0.85 | 15/20 |

References

  • fastvideo/tests/ssim/ — SSIM test infrastructure
  • fastvideo/tests/training/Vanilla/test_training_loss.py — loss comparison
  • .agents/memory/evaluation-registry/README.md — metric catalog

Changelog

Date Change
2026-03-02 Initial version with SSIM, loss trajectory, caption consistency stub
Install via CLI
npx skills add https://github.com/hao-ai-lab/FastVideo --skill evaluate-video-quality
Repository Details
star Stars 3,719
call_split Forks 362
navigation Branch main
article Path SKILL.md
More from Creator