name: reel-maker user-invocable: true description: "Create production-ready YouTube Shorts and Instagram Reels as MP4 video files with animated motion graphics, timed captions, and scene transitions. Use this skill whenever the user asks to make a reel, Short, vertical video, TikTok, 9:16 video, animated explainer, or short-form video content. Also trigger when the user says 'render a video', 'export as MP4', 'create a Short for [topic]', 'make an animated reel', or references producing video content from a script or outline. This skill handles the full pipeline: scriptwriting, animated preview, and final MP4 export via Pillow + FFmpeg."
Reel Maker
Create production-ready YouTube Shorts / Instagram Reels with animated motion graphics, timed captions, and H.264 MP4 export.
What This Skill Produces
Given a topic, outline, or script → this skill outputs:
- A 6-scene narration script with timestamped caption chunks
- An animated React preview (optional, for timing/pacing review)
- A 1080×1920 @30fps H.264 MP4 with animated visuals
- A narrated MP4 with TTS audio muxed in (Dia, Higgs Audio, or Flite)
Dependencies
Core (always required):
- Python 3 with Pillow (
pip install pillow --break-system-packagesif needed) - FFmpeg (system-installed, verify with
which ffmpeg) - DejaVu fonts (standard on Ubuntu; on macOS/Windows the renderer falls back to system fonts automatically)
TTS — install one (in order of quality):
- Dia (recommended):
pip install git+https://github.com/nari-labs/dia.git+ CUDA GPU (~4.4GB) - Higgs Audio API:
pip install requests+ setDEEPINFRA_API_KEYenv var - Flite (fallback):
libflitesystem package (pre-installed on Ubuntu)
Preview (optional):
- React artifact with Tone.js (available in Claude artifacts)
Workflow
Phase 1: Script
Read references/script_rules.md before writing the script.
Input: A topic, video outline, or brief description. Output: A markdown file containing:
- Scene breakdown (6 scenes with timestamps, labels, colors)
- Full narration (~155-170 words for 55-60 seconds)
- Caption chunks array: [startTime, endTime, "3-5 words"] per chunk
- Cross-link target (which video the cliffhanger teases)
The 6-scene structure is always:
| # | Scene | Time | Purpose |
|---|---|---|---|
| 1 | Cold Open | 0:00-0:05 | Provocative hook, no preamble |
| 2 | Promise Lock | 0:05-0:12 | Name the concept + credibility spike |
| 3 | The Problem | 0:12-0:25 | Build the mechanic step by step |
| 4 | The Evidence | 0:25-0:38 | Dramatic escalation (the "cliff" moment) |
| 5 | The Fix | 0:38-0:48 | Actionable solution, time-boxed |
| 6 | Cliffhanger | 0:48-0:57 | Tease next video, open loop |
Phase 2: Animated Preview (Optional)
If the user wants to preview before rendering, create a React JSX artifact:
- Each scene is a component receiving
progress(0→1) - Captions displayed as 3-5 word chunks in a pill overlay at the bottom
- Segmented progress bar at top, color-coded per scene
- Tone.js audio: ambient drone (PolySynth FM, C2+G2, -26dB through LowPass 350Hz + Reverb), scene transition hits (MembraneSynth), caption ticks (MetalSynth, -32dB)
- Controls: play/pause, restart, mute, scene jump buttons
Important: speechSynthesis is blocked in artifact iframes. Use only Tone.js for audio. Voiceover is recorded separately.
Phase 3: Render to MP4
This is the production pipeline. Read references/export_specs.md for the locked FFmpeg command and codec specs.
Step 1: Import shared helpers:
# Copy core/drawing.py and core/scene_base.py to working directory
# These provide: draw_rounded_rect, centered_text, left_text, right_text,
# rgba, draw_progress_bar, draw_caption, render_frame shell
Step 2: Write scene renderers. Each scene is a function:
def draw_scene_N(draw, img, progress):
# progress is 0.0 → 1.0 across the scene's duration
# Use draw (ImageDraw) and img (PIL Image) to render
# All animation derived from progress via easing math
For animation patterns (glitch effects, filling containers, charts, diagrams, crack effects), read references/scene_cookbook.md.
For a complete working example, read examples/video5_context_erosion.py.
Step 3: Render and pipe to FFmpeg:
from core.renderer import render_to_mp4
render_to_mp4(scenes, captions, scene_renderers, output_path)
Step 4: Validate output:
ffprobe -v quiet -print_format json -show_streams output.mp4
Verify: codec=h264, 1080×1920, 30fps, yuv420p, correct frame count.
Design System (Locked)
| Element | Value |
|---|---|
| Background | (13, 17, 23) / #0D1117 |
| Alert Red | (255, 59, 48) / #FF3B30 |
| Data Green | (0, 255, 136) / #00FF88 |
| Electric Blue | (0, 122, 255) / #007AFF |
| Warning Yellow | (255, 214, 10) / #FFD60A |
| Text White | (240, 246, 252) / #F0F6FC |
| Muted Gray | (139, 148, 158) / #8B949E |
| Font (render) | DejaVuSansMono-Bold / DejaVuSans-Bold |
| Resolution | 1080 × 1920 (9:16) |
| Frame rate | 30 FPS |
| Duration | 55-60 seconds max |
Common Animation Patterns
Quick reference (full details in references/scene_cookbook.md):
fade_in = min(1, progress * 3) # Fast fade
delayed = max(0, (progress - 0.5) * 2) # Appears at 50%
slide_up = (1 - opacity) * 20 # Y offset
pulse = 0.5 + math.sin(progress * 20) * 0.5 # Oscillation
cliff_drop = max(0, min(1, (progress - 0.35) * 2.5)) # Delayed sharp drop
Phase 4: Narration (TTS)
Read references/tts_setup.md for installation of TTS backends.
The skill auto-detects the best available TTS engine:
| Priority | Backend | Quality | Requirement |
|---|---|---|---|
| 1 | Dia (local) | High — natural, expressive | CUDA GPU ~4.4GB VRAM |
| 2 | Higgs Audio (API) | Highest — beats GPT-4o-mini-tts | API key + network |
| 3 | Flite (offline) | Low — timing reference only | Always available |
Generate narration and mux into MP4:
from core.tts import generate_narration, mux_audio_video
# Each scene dict needs 'id', 'start', 'end', 'text' keys
narration_wav = generate_narration(scenes, output_dir="/output",
voice_seed=42) # seed = consistent voice
mux_audio_video("silent_video.mp4", narration_wav, "final_with_audio.mp4")
The pipeline generates speech per scene, time-stretches each to fit its scene duration, aligns them to exact timestamps via FFmpeg adelay, and muxes the result into the MP4 as AAC audio.
Dia speaker tags: For single-narrator Shorts, all text is wrapped in [S1]. Non-verbal tags like (laughs), (sighs), (clears throat) are supported but should be used sparingly.
Voice consistency: Dia generates a random voice per run. Use a fixed voice_seed to keep the same voice across all 6 scenes. Change the seed for a different voice.
Known Limitations
| Limitation | Workaround |
|---|---|
| Dia needs CUDA GPU (~4.4GB VRAM) | Use Higgs Audio API, or Flite for timing reference |
| Flite voice is robotic | Use as timing guide, record own voiceover or use Dia/Higgs |
| speechSynthesis blocked in artifact iframe | Use Tone.js for preview audio; real TTS in render phase |
| Emoji rendering in Pillow | Use text symbols (↻, ⚡, ?) instead of emoji |
| No anti-aliasing on Pillow shapes | 1080p resolution compensates; use clean geometry |
| Browser video export unreliable | Always use Python+FFmpeg server-side render |
| WebM→MP4 conversion drops frames | Render directly to MP4, never convert |
Post-Production (If Using Flite or No TTS)
If the narration was generated with Dia or Higgs Audio, the MP4 is ready for upload. If using Flite or no TTS:
- Import MP4 into editor (Premiere / DaVinci / CapCut)
- Record voiceover at ~160 WPM matching caption timing
- Layer voiceover + optional music bed (-20dB under voice)
- Export H.264 1080×1920 30fps matching source
- Upload: YouTube Shorts (pin comment with full video link), Instagram Reels (same file)