name: demo-promo description: Turn a raw screen-recording / product demo into a polished, branded LANDSCAPE promo video that drives engagement. Pipeline — cut dead air, color-grade, hook card, persistent rotating TOP purpose-strap (Mike Chang style), image PiP inserts (show the actual artifact built), platform/feature chip strip, benefit callout ("this saves you HOURS"), outro CTA card, British/American AI voiceover (ElevenLabs) + sidechain-ducked cinematic BGM. Whisper-syncs overlays to the narration. Distilled from the SkynetLabs "Claude Code auto-publishes content" promo (2026-05-29). Trigger when user says "polish this demo", "make a promo from this recording", "edit this screen recording", "add hook + strap + voiceover", "/demo-promo", "Mike Chang style heading", or pastes a raw .mkv/.mp4 screencast asking to make it engaging.
Demo-Promo — Screencast → Branded Engagement Promo
End-to-end pipeline that turns a long raw screen-recording into a tight, branded, voiced promo. Landscape 16:9 (LI / YouTube / FB / X). For 9:16 reels use reel-studio instead.
Proven on: Videos/2026-05-29_claude-autopublish_FINAL.mp4 (8:12 raw → 1:52 promo).
When to use
Raw demo/screencast + user wants it "polished", "perfect", "engaging", with hook / heading / voiceover / music. NOT for talking-head vertical reels (→ reel-studio), NOT for static social cards (→ social-stack).
The 6-phase pipeline
- Probe raw video (ffprobe): res, fps, duration, codecs.
- Cut + grade — trim dead air to the keeper segments, dissolve at joins, CLEAN color grade. →
*_POLISHED.mp4(keep as backup; original audio). - VO — write ~200-260 word script narrating the demo → ElevenLabs TTS → loudnorm to stereo WAV.
- Sync — faster-whisper transcribes the VO → segment timestamps → map overlays to narration beats.
- Overlays — hook card + rotating top purpose-strap + image PiP inserts + chip strip + benefit callout + outro CTA card.
- Mix + mux — continuous cinematic BGM (looped + dynaudnorm) sidechain-ducked under the VO; concat hook+main+outro; mux. →
*_FINAL.mp4.
recipes/build_promo.py is the runnable orchestrator — edit the CONFIG block at the top and run. It encodes every gotcha below.
Engagement layer (why each element earns its place)
- Hook card (0-3s) — pattern-interrupt headline + a benefit promise ("saves me 20+ hours/week"). Stops the scroll.
- Top purpose-strap (persistent, rotating) — Mike Chang style emerald banner at top, states the video's PURPOSE, rotates 2-4 messages tracking the narration. Carries meaning for muted autoplay (most of LI/IG feed).
- Image PiP inserts — drop the ACTUAL artifact the demo produced (the generated design/post) as a bordered picture-in-picture with a "GENERATED LIVE" tag. Proof > claims.
- Chip strip — concrete list (platforms / features) on a dark bar. Specificity = comment-bait ("wait, ALL of those?").
- Benefit callout — big mid-video spike ("THIS SAVES 20+ HOURS / EVERY SINGLE WEEK") placed over the strongest proof frame.
- Outro CTA card — ONE-word DM mechanic ("DM me one word → PDF") + handle. Drives comments/DMs, not just likes.
- VO + ducked cinematic BGM — pro audio bed; music ducks under voice, swells in the outro.
Brand constants (SkynetLabs cream+emerald)
CREAM=0xF5EFE1 EMERALD=0x0E7C5A MINT=0x8fe3c4 BLACK=0x1A1A1A GREY=0x6b6b6b
Fonts: Impact (C:/Windows/Fonts/impact.ttf) for headlines · Arial Bold (arialbd.ttf) for straps/body
Encode: -c:v libx264 -preset fast -crf 16 -pix_fmt yuv420p -r 60 -movflags +faststart
Handle is @yourhandle everywhere (use one consistent handle).
Voiceover (ElevenLabs)
- Key:
~/.claude/secrets/elevenlabs.env(ELEVENLABS_API_KEY=sk_...) - Default voice Daniel UK
onwK4e9ZLuTAKqWW03F9(British, broadcaster). American alt: pick a US voice id. - Model
eleven_multilingual_v2, settings{stability:0.5, similarity_boost:0.75, style:0.0, use_speaker_boost:true} - Build the JSON body in Python (UTF-8 safe), POST with
curl --data-binary @body.json. - Script length: ~150 wpm → target ≈ (video_seconds − 15) words. Start VO ~2.5s after the main video begins (let the opening breathe).
Music
Mike Chang BGM: <repo>/interview-clip-engine/assets/bgm/mixkit-cinematic-871.mp3 (101s).
Other cinematic beds in the same bgm/ folder. Or source fresh from Mixkit (https://assets.mixkit.co/music/NUMBER/NUMBER.mp3).
CRITICAL gotchas (every one cost a render — do NOT relearn)
- Windows: always
-filter_complex_script <file>.txt, never inline filtergraph (escaping hell). Flag is-filter_complex_script— a stray/(-/filter...) = "Invalid argument". - Fonts by relative name — copy
impact.ttf/arialbd.ttfinto the working dir and referencefontfile=impact.ttf. Avoids escaping theC:colon inside drawtext. - sidechaincompress reuses a label → fails silently. A filter output pad feeds exactly ONE input. The VO is needed twice (sidechain key + the mix), so
asplit=2[vmix][vkey]. If you reuse[voice]directly, the voice vanishes from the mix (only ducked bgm survives, ~-45dB). - Sidechain tail goes dead when the key ends. After the VO stops, the compressor has no trigger and the bgm output truncates/silences. Fix:
apadthe VO to the full timeline length so a (silent) key exists end-to-end → bgm un-ducks and rides the outro. - VO too quiet when mono VO meets stereo BGM in amix. Pre-bake the VO:
loudnorm=I=-15:TP=-1.5:LRA=11,aformat=channel_layouts=stereo:sample_rates=44100→ WAV, then mix that. Do NOT loudnorm inside the realtime graph. - BGM's own tail fades to nothing (cinematic tracks end soft).
-stream_loop 1the bgm input +dynaudnorm=f=250:g=8to flatten, so the outro keeps a present bed. Then your ownafadecontrols the ending. drawtext box=1:boxborderw=Nauto-sizes the pill to the text — easiest way to make a clean centered strap/badge. Center withx=(w-text_w)/2.- Python %-formatted filter strings: the number of
%sMUST equal the tuple length, in order. Miscount =TypeError: not all arguments converted. Prefer building the string without%if it gets hairy. - amix:
duration=longest:normalize=0(normalize=0 keeps VO loud; longest keeps the music tail). Trim/atrimthe bgm to the timeline length solongestdoesn't blow out to the looped 202s. - Cut with one filtergraph:
trim+setpts=PTS-STARTPTSper segment,xfade=transition=fade:offset=<segA_dur-0.5>for the join,acrossfadefor audio. Verify segment math with a frame extract at the join. - Verify every render:
ffprobeduration +volumedetecton 4-5 windows (intro / VO body / callout / outro) + extract frames at each overlay beat and look at them. Max_volume should sit ~ -4dB (headroom, no clip).
Sync overlays to narration
from faster_whisper import WhisperModel
m=WhisperModel("small", device="cpu", compute_type="int8")
segs,_=m.transcribe("voice_loud.wav")
for s in segs: print(f"{s.start:6.1f}-{s.end:6.1f} {s.text.strip()}")
main_local_time = VO_local_time + VO_lead (e.g. +2.5s). Place: image PiP at the "it built X" line, chip strip at the "publishes to A,B,C" line, benefit callout at the "replaces a team / saves hours" line. Rotate the top strap on the phase boundaries.
Output naming
<date>_<slug>_POLISHED.mp4 (cut + grade, original audio, backup) and <date>_<slug>_FINAL.mp4 (full promo). Write to the user's Videos/ folder.
Follow-ups to offer
- 9:16 vertical version (→ reel-studio, or re-frame crop).
- Post caption + hashtags (→ social-stack / aeo-daily voice).
- Wire the outro DM keyword (e.g. "PDF") to a ManyChat/GHL auto-responder so the CTA is hands-free.
Reference build
recipes/build_promo.py — the exact, parametrized orchestrator (CONFIG block → run).
references/EXAMPLE-claude-autopublish.md — the 2026-05-29 build that defined this skill (config + timings + script).