validation

star 3

Use this skill when working on MLXR runtime debugging, log triage, validation discipline, benchmark-backed claims, or deciding whether runtime behavior is strong enough to count as evidence rather than a one-off run.

numman-ali By numman-ali schedule Updated 3/14/2026

name: validation description: Use this skill when working on MLXR runtime debugging, log triage, validation discipline, benchmark-backed claims, or deciding whether runtime behavior is strong enough to count as evidence rather than a one-off run.

Validation

Use this skill for the narrow slice that sits beyond the normal dev loop:

  • debugging runtime behavior
  • inspecting logs after source, artifact, or daemon changes
  • validating whether a claim is benchmark-grade or still provisional
  • triaging whether a behavior is a platform truth, a family quirk, or a likely bug

Do not use this skill for the common implementation path. The default loop still lives in AGENTS.md, MEMORY.md, and docs/dev-harness.md.

For real LTX-2.3 fidelity bring-up and stage-local smoke debugging, use the dedicated ltx-fidelity-debugging skill instead of stretching this generic validation overlay.

Default workflow

  1. Run the repo harness first:
    • uv run python scripts/dev.py verify
    • uv run python scripts/dev.py logs --tail 120
  2. Inspect the relevant platform docs before interpreting results.
  3. When a promoted clip or showcase result depends on semantic correctness, run scripts/gemini_describe_video.py as an automated semantic cross-check and keep the parsed XML review with the receipt.
  4. Distinguish between:
    • implementation bug
    • missing benchmark evidence
    • stale docs
    • still-open architecture question
  5. If the result is still uncertain, downgrade the claim instead of over-stating it.

What to load

  • For benchmark strength and freeze criteria:
    • docs/benchmark-matrix.md
    • docs/research/09-open-questions-and-validation-plan.md
  • For runtime and scheduler interpretation:
    • docs/technical-design.md
    • docs/research/05-optimization-playbook.md
  • For LTX-specific runtime pressure:
    • docs/research/07-ltx-integration-seams.md

Validation rules

  • Prefer measured local evidence over inferred explanations.
  • Separate model behavior, output-path behavior, and harness behavior.
  • Treat logs as evidence, not decoration.
  • Do not promote a result to a stable claim if it still depends on one machine, one family, or one narrow path.
Install via CLI
npx skills add https://github.com/numman-ali/mlxr --skill validation
Repository Details
star Stars 3
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator