disable-model-invocation: true
name: listen-later
description: Convert an article, newsletter, or document into a short Kokoro-narrated audio read-up and upload it as a private episode to the user's "๐ฅ Listen Later" Spotify show. ONLY trigger on explicit phrases like "listen later", "read-up", "save as audio for the commute", "add to my listen-later feed". Do NOT trigger on generic summarize, TTS, save-to-spotify, podcast, or cover-art requests โ those route to the save-to-spotify skill.
Listen Later
Opinionated pipeline: arbitrary text โ Kokoro af_heart audio โ private episode in the ๐ฅ Listen Later Spotify show.
For voice cloning, multilingual, custom cover art, or full podcast production, stop and use save-to-spotify directly โ this skill is intentionally rigid.
Defaults (do not ask unless user overrides)
| Voice | Kokoro af_heart, 1.0ร (kokoro on PATH) |
| Length | Mode-dependent โ see below |
| Show | ๐ฅ Listen Later (must already exist; resolve URI via save-to-spotify --json shows) |
| Cover | Reuse show cover for the episode (no per-episode art) |
| Timeline | Chapters only โ no images, no link companions |
| Polling | Off by default. Only poll when the user explicitly asks to wait until ready. |
| Chapter rule | Every chapter โฅ30s (Spotify rejects too many short ones). Consolidate adjacent segments into chapters after rendering. First chapter MUST start at 0. |
Mode selection (infer from the user's words, do not ask)
Verbatim mode โ default when the user says "read-up", "save to listen later", "add to my queue", or just pastes text. Narrate the full text, lightly cleaned for TTS (strip markdown / hashtags / emojis / URLs, expand abbreviations, em dash โ hyphen). Do NOT cut content. Length follows the source: ~150 wpm โ a 1500-word article becomes ~10 minutes.
Summarize mode โ only when the user says "summarize", "TL;DR", "short version", or "key points". Pick target length from source complexity:
Source Target Tweet thread / short blog post (<800 words) 1โ2 min Newsletter / medium article (800โ3000 words) 3โ5 min Long-form essay / report / paper (>3000 words) 5โ8 min Dense technical / multi-topic source bias to the longer end Within the target, write 6โ12 declarative segments preserving the source's structure 1:1.
Segment count rule of thumb: ~30โ60 seconds of speech per segment.
Interview
Do not ask for confirmation. Derive the episode title from the source title, article <title>, or URL slug. If the user supplies a title, use it.
Flow
- Preflight: run one shell block that creates the work directory, checks
save-to-spotify --json auth status, checkswhich kokoro, extracts the page, and resolves the๐ฅ Listen Latershow URI fromsave-to-spotify --json shows. Do not inspect files one at a time unless something fails. - Script: write
segments.jsonwith 6โ10 declarative segments, links stripped, abbreviations expanded for TTS, em dashes โ hyphens. Include chapter titles in the same file. - Render: render all segments in one shell block. Prefer parallel Kokoro jobs when there is more than one segment, capped at CPU count:
printf '%s\0' seg_*.txt | xargs -0 -P "$(sysctl -n hw.ncpu 2>/dev/null || nproc)" -I{} sh -c 'kokoro -t "$(cat "$1")" -o "${1%.txt}.wav"' sh {}. - Silence: generate WAV silence once with ffmpeg: 300 ms between segments and 600 ms as outer pad.
- Concat: concat WAVs with the ffmpeg concat demuxer, then encode and normalize in a single final MP3 command when possible:
ffmpeg -f concat -safe 0 -i concat.txt -af loudnorm -ar 44100 -ac 1 -b:a 192k episode.mp3. - Durations: use
ffprobeon WAV/MP3 files in one shell block and writedurations.json. - Chapters: cursor walks the actual durations. Force first chapter to
start_time_ms: 0. Merge adjacent segments until every chapter is โฅ30s โ typically ends up at 3โ5 chapters for a 3-min episode. - Description: short HTML โ one intro paragraph,
<ul>ofM:SS โ Title, source link if supplied. - Upload:
save-to-spotify --json episodes create --show-id <SHOW_URI> --title "<T>" --file episode.mp3 --image show_cover.jpg --summary "<HTML>". - Timeline:
save-to-spotify --json timeline set --episode-id <EP_ID> --from-file timeline.json. - Poll: skip by default. If the user explicitly asks to wait until ready, run
save-to-spotify --json episodes status <EP_ID>untilreadiness == READY.
Working directory
Use /tmp/listen-later/<slug>/ (segments, silences, episode.mp3, timeline.json, description.html). Write incrementally โ if step N fails, prior steps are preserved.
Errors to watch
first chapter must start at 0 msโ setitems[0].chapter.start_time_ms = 0.too many short chaptersโ merge adjacent segment chapters until โฅ30s each.- Missing
๐ฅ Listen Latershow โ ask the user to create it once viasave-to-spotify shows create --title "๐ฅ Listen Later" ..., do NOT auto-create silently.