name: data description: Caption training datasets for Graffito video generation model training — combined, OC, and synth/OOD sets; all T5-optimized.
Data
What This Folder Contains
AI-generated video caption datasets for training and fine-tuning video generation models on the Graffito aesthetic. All captions are T5-optimized (no markdown, emphasis via ALL CAPS).
When to Consult This Folder
When training or fine-tuning a video generation model, evaluating caption quality, or adding new caption data.
Files
| File | What It Contains |
|---|---|
| captions-all.md | Combined dataset — all OC and synth captions merged across all scenes and character tiers. Use as the primary training set. |
| captions-oc.md | OC dataset — captions featuring Tony, Monk, and Grandma in canonical Graffito scenes. |
| captions-synth.md | Synth/OOD dataset — captions featuring expanded universe characters and out-of-distribution scenarios for model generalization. |
Usage Guidance
Load production/caption-spec-v3.md before generating new captions — it governs all formatting and vocabulary. captions-all.md is the merged training set; OC and synth are the source splits.