name: validation description: Use this skill when working on MLXR runtime debugging, log triage, validation discipline, benchmark-backed claims, or deciding whether runtime behavior is strong enough to count as evidence rather than a one-off run.
Validation
Use this skill for the narrow slice that sits beyond the normal dev loop:
- debugging runtime behavior
- inspecting logs after source, artifact, or daemon changes
- validating whether a claim is benchmark-grade or still provisional
- triaging whether a behavior is a platform truth, a family quirk, or a likely bug
Do not use this skill for the common implementation path. The default loop still lives in AGENTS.md, MEMORY.md, and docs/dev-harness.md.
For real LTX-2.3 fidelity bring-up and stage-local smoke debugging, use the dedicated ltx-fidelity-debugging skill instead of stretching this generic validation overlay.
Default workflow
- Run the repo harness first:
uv run python scripts/dev.py verifyuv run python scripts/dev.py logs --tail 120
- Inspect the relevant platform docs before interpreting results.
- When a promoted clip or showcase result depends on semantic correctness, run
scripts/gemini_describe_video.pyas an automated semantic cross-check and keep the parsed XML review with the receipt. - Distinguish between:
- implementation bug
- missing benchmark evidence
- stale docs
- still-open architecture question
- If the result is still uncertain, downgrade the claim instead of over-stating it.
What to load
- For benchmark strength and freeze criteria:
docs/benchmark-matrix.mddocs/research/09-open-questions-and-validation-plan.md
- For runtime and scheduler interpretation:
docs/technical-design.mddocs/research/05-optimization-playbook.md
- For LTX-specific runtime pressure:
docs/research/07-ltx-integration-seams.md
Validation rules
- Prefer measured local evidence over inferred explanations.
- Separate model behavior, output-path behavior, and harness behavior.
- Treat logs as evidence, not decoration.
- Do not promote a result to a stable claim if it still depends on one machine, one family, or one narrow path.