name: song-subtitle-srt
description: Generate final .srt subtitles for song videos from a recorded performance video and a trusted lyrics file. Use when Codex needs to process a song folder such as songs/song-name/ that contains song.mov and lyrics.txt, extract audio.wav, run whisper-cli to produce rough timing, and then turn the rough subtitle timing plus the lyrics into an accurate final final.srt.
Song Subtitle SRT
Overview
Produce four files inside the song folder: audio.wav, rough.srt, rough.json, and final.srt.
Treat lyrics.txt as the text ground truth and rough.srt as the timing ground truth. Use rough.json only when segment boundaries remain ambiguous after reading rough.srt.
Workflow
- Validate the input folder.
Require these files:
song.movlyrics.txt
Expect the final folder shape to be:
songs/<name>/
song.mov
lyrics.txt
audio.wav
rough.srt
rough.json
final.srt
If song.mov or lyrics.txt is missing, stop and report the missing path.
- Generate rough subtitle artifacts.
Run:
skills/song-subtitle-srt/scripts/generate_rough_subtitles.sh songs/<name>
That script must:
- extract
audio.wavfromsong.movas16 kHz,mono,pcm_s16le - run the repo's
tools/whisper.cpp/build/bin/whisper-cli - write
rough.srtandrough.json - use
-l zhand-ml 8
If ffmpeg, whisper-cli, or the model file is missing, stop and report the exact missing dependency.
- Build
final.srt.
Read:
lyrics.txtrough.srtreferences/alignment.md
Follow these rules exactly:
- treat each non-empty lyrics line as one required final subtitle cue
- copy each final cue text from
lyrics.txtexactly - preserve lyrical order exactly
- keep timestamps monotonic and non-overlapping
- prefer merging or splitting rough segments rather than inventing new timing from scratch
- use
rough.jsononly ifrough.srtdoes not expose enough timing detail
Write the result to final.srt in the same song folder.
- Validate the final subtitle.
Run:
python3 skills/song-subtitle-srt/scripts/validate_srt.py songs/<name>/final.srt songs/<name>/lyrics.txt
If validation fails, fix final.srt and rerun the validator until it passes.
Alignment Rules
Read references/alignment.md before writing final.srt.
Apply the default mapping strategy:
- merge multiple rough cues when one lyric line spans them
- split one rough cue across multiple lyric lines when needed
- split by relative lyric length when no better timing clue exists
- smooth adjacent boundaries so cues can touch but never overlap
Do not paraphrase, normalize, or "improve" the lyrics text. Correct Whisper mistakes by replacing them with the exact lyric lines.
Output Contract
Ensure final.srt satisfies all of these:
- cue count equals the number of non-empty lines in
lyrics.txt - each cue text equals the corresponding lyric line exactly
- cue numbers start at
1and increase by1 - timestamps use
HH:MM:SS,mmm --> HH:MM:SS,mmm - every cue has non-empty text
If rough timing and lyrics order clearly disagree, stop and report the conflict instead of fabricating a subtitle file.