song-subtitle-srt

star 0

Generate final `.srt` subtitles for song videos from a recorded performance video and a trusted lyrics file. Use when Codex needs to process a song folder such as `songs/song-name/` that contains `song.mov` and `lyrics.txt`, extract `audio.wav`, run `whisper-cli` to produce rough timing, and then turn the rough subtitle timing plus the lyrics into an accurate final `final.srt`.

x1uc By x1uc schedule Updated 4/26/2026

name: song-subtitle-srt description: Generate final .srt subtitles for song videos from a recorded performance video and a trusted lyrics file. Use when Codex needs to process a song folder such as songs/song-name/ that contains song.mov and lyrics.txt, extract audio.wav, run whisper-cli to produce rough timing, and then turn the rough subtitle timing plus the lyrics into an accurate final final.srt.

Song Subtitle SRT

Overview

Produce four files inside the song folder: audio.wav, rough.srt, rough.json, and final.srt.

Treat lyrics.txt as the text ground truth and rough.srt as the timing ground truth. Use rough.json only when segment boundaries remain ambiguous after reading rough.srt.

Workflow

  1. Validate the input folder.

Require these files:

  • song.mov
  • lyrics.txt

Expect the final folder shape to be:

songs/<name>/
  song.mov
  lyrics.txt
  audio.wav
  rough.srt
  rough.json
  final.srt

If song.mov or lyrics.txt is missing, stop and report the missing path.

  1. Generate rough subtitle artifacts.

Run:

skills/song-subtitle-srt/scripts/generate_rough_subtitles.sh songs/<name>

That script must:

  • extract audio.wav from song.mov as 16 kHz, mono, pcm_s16le
  • run the repo's tools/whisper.cpp/build/bin/whisper-cli
  • write rough.srt and rough.json
  • use -l zh and -ml 8

If ffmpeg, whisper-cli, or the model file is missing, stop and report the exact missing dependency.

  1. Build final.srt.

Read:

  • lyrics.txt
  • rough.srt
  • references/alignment.md

Follow these rules exactly:

  • treat each non-empty lyrics line as one required final subtitle cue
  • copy each final cue text from lyrics.txt exactly
  • preserve lyrical order exactly
  • keep timestamps monotonic and non-overlapping
  • prefer merging or splitting rough segments rather than inventing new timing from scratch
  • use rough.json only if rough.srt does not expose enough timing detail

Write the result to final.srt in the same song folder.

  1. Validate the final subtitle.

Run:

python3 skills/song-subtitle-srt/scripts/validate_srt.py songs/<name>/final.srt songs/<name>/lyrics.txt

If validation fails, fix final.srt and rerun the validator until it passes.

Alignment Rules

Read references/alignment.md before writing final.srt.

Apply the default mapping strategy:

  • merge multiple rough cues when one lyric line spans them
  • split one rough cue across multiple lyric lines when needed
  • split by relative lyric length when no better timing clue exists
  • smooth adjacent boundaries so cues can touch but never overlap

Do not paraphrase, normalize, or "improve" the lyrics text. Correct Whisper mistakes by replacing them with the exact lyric lines.

Output Contract

Ensure final.srt satisfies all of these:

  • cue count equals the number of non-empty lines in lyrics.txt
  • each cue text equals the corresponding lyric line exactly
  • cue numbers start at 1 and increase by 1
  • timestamps use HH:MM:SS,mmm --> HH:MM:SS,mmm
  • every cue has non-empty text

If rough timing and lyrics order clearly disagree, stop and report the conflict instead of fabricating a subtitle file.

Install via CLI
npx skills add https://github.com/x1uc/songs-subtitle --skill song-subtitle-srt
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator