dataset-provenance-manifest-governance - SKILL.md Agent Skill

name: dataset-provenance-manifest-governance description: Keep local training dataset provenance, merged outputs, and manifest truth aligned across dataset-building repos. Use when teacher distillation, enrichment exports, or study manifests change together.

Use this skill when the Penny training-data chain changes across Qwen Training and Media Workbench surfaces.

Identify the source manifests, merged outputs, and derived dataset artifacts involved.
Confirm provenance rules before changing batch assembly or export paths.
Keep teacher, student, and enrichment stages legible in the manifest chain.
Re-run the relevant validation or export proof after updates.
Record whether the dataset remains local-only and reproducible.

Do not collapse teacher, enrichment, and student stages into one opaque artifact.
Keep local-only provenance explicit.
Do not silently rewrite upstream source media or manifests.
Fail closed if the manifest chain no longer explains the output.

Use this skill only for dataset provenance, manifest lineage, and derived training-artifact governance.

For checkpoint evaluation or fine-tune execution, route through the more specific training skills.