data

name: data description: Caption training datasets for Graffito video generation model training — combined, OC, and synth/OOD sets; all T5-optimized.

What This Folder Contains

AI-generated video caption datasets for training and fine-tuning video generation models on the Graffito aesthetic. All captions are T5-optimized (no markdown, emphasis via ALL CAPS).

When to Consult This Folder

When training or fine-tuning a video generation model, evaluating caption quality, or adding new caption data.

Files

File	What It Contains
captions-all.md	Combined dataset — all OC and synth captions merged across all scenes and character tiers. Use as the primary training set.
captions-oc.md	OC dataset — captions featuring Tony, Monk, and Grandma in canonical Graffito scenes.
captions-synth.md	Synth/OOD dataset — captions featuring expanded universe characters and out-of-distribution scenarios for model generalization.

Usage Guidance

Load production/caption-spec-v3.md before generating new captions — it governs all formatting and vocabulary. captions-all.md is the merged training set; OC and synth are the source splits.