name: generate-podcast description: Render a spoken-word MP3 podcast from wiki pages — single-host by default or two-voice dialogue. Piper TTS default (local, free); falls back to ElevenLabs / OpenAI TTS when their API keys are present. Used by /generate podcast. Not user-invocable directly — go through /generate. user-invocable: false allowed-tools: Bash(which *) Bash(brew *) Bash(git *) Bash(mkdir *) Bash(date *) Bash(cat *) Bash(sed *) Bash(grep *) Bash(awk *) Bash(ffmpeg *) Bash(piper *) Bash(curl *) Bash(npm *) Bash(pnpm *) Bash(npx *) Read Write Glob Grep content-pipeline: - pipeline:audio - platform:agnostic - role:adapter
Generate Podcast
Produce a 3–10 minute MP3 explainer from wiki pages. The LLM writes a spoken-word narrative, TTS renders each line, ffmpeg concatenates into a single MP3.
Artifact-first — output lands in vaults/<vault>/artifacts/podcast/.
Usage (via /generate router)
/generate podcast <topic> [--vault <name>] [--length short|medium|long] [--two-voice] [--voice <name>]
--length—short(3 min),6 min, default),medium(long(~10 min).--two-voice— dialogue between two hosts instead of a monologue.--voice— override the default Piper voice. Ignored when the ElevenLabs / OpenAI fallback kicks in.
Same topic resolution as sibling handlers — reuses .claude/skills/generate/lib/select-pages.sh.
Pipeline
wiki pages → LLM script writer → script.md → TTS per line → ffmpeg concat → podcast.mp3
Keep the .script.md alongside the MP3 — it's diffable, re-renderable, and the honest primary artifact.
Step 1: Dependency Check
HAS_FFMPEG=0; HAS_PIPER=0
which ffmpeg >/dev/null 2>&1 && HAS_FFMPEG=1
which piper >/dev/null 2>&1 && HAS_PIPER=1
if [ "$HAS_FFMPEG" = "0" ]; then
echo "ffmpeg missing. Installing via Homebrew…"
brew install ffmpeg
fi
# Piper is optional if ELEVENLABS_API_KEY or OPENAI_API_KEY is set.
if [ "$HAS_PIPER" = "0" ] && [ -z "$ELEVENLABS_API_KEY" ] && [ -z "$OPENAI_API_KEY" ]; then
echo "Piper not found and no cloud TTS key present."
echo "Installing Piper (local, free, robotic-but-serviceable)…"
brew install piper-tts 2>/dev/null || {
echo "Homebrew install failed. See https://github.com/rhasspy/piper for manual install."
exit 1
}
fi
Step 2: Resolve Vault + Topic
mapfile -t PAGES < <(.claude/skills/generate/lib/select-pages.sh "$VAULT_DIR" "$TOPIC")
Exit 1 from the helper = no pages matched; surface verbatim.
Step 3: Compute Source Hash
HASH=$(.claude/skills/generate/lib/source-hash.sh "${PAGES[@]}")
Step 4: Write the Script
The invoking LLM reads the selected pages and writes a narrative script.md. Two shapes supported:
Single-host monologue (default)
# Podcast: {{topic}}
_Length target: {{length}} (~{{minutes}} min)._
[HOST]: Welcome. Today we're talking about {{topic}}. Here's why that matters…
[HOST]: First, the basics. According to {{cite: wiki/concepts/attention.md}}, attention is…
[HOST]: …
Two-voice dialogue (--two-voice)
# Podcast: {{topic}}
[A]: Alright, let's get into {{topic}}.
[B]: Why this, why now?
[A]: Because {{cite: wiki/concepts/rag.md}}…
[B]: Huh. I thought…
[A]: Right, but here's the nuance…
Script-writing rules the LLM follows:
- Spoken-word, not read-aloud-bullets. Full sentences with natural cadence.
- Cite wiki pages inline with
{{cite: path}}— preprocessed to*pagename*before TTS sees them. - Length target: ~150 words per minute. 3 min → ~450 words; 6 min → ~900; 10 min → ~1500.
- No "as we mentioned earlier" crutch unless the script actually mentioned it.
- End with a short sources spoken-list — TTS handles it fine.
Templates live at .claude/skills/generate-podcast/templates/{single-host,two-voice}.md and give the LLM a starting shape.
Step 5: TTS Backend Selection
Priority order:
| Priority | Backend | Trigger | Cost | Quality |
|---|---|---|---|---|
| 1 | ElevenLabs | ELEVENLABS_API_KEY set |
~$0.30 per 1k chars | Studio-grade |
| 2 | OpenAI TTS | OPENAI_API_KEY set |
~$0.015 per 1k chars | Very good |
| 3 | Piper (local) | always available once installed | free | Robotic but clean |
if [ -n "$ELEVENLABS_API_KEY" ]; then
TTS_BACKEND="elevenlabs"
elif [ -n "$OPENAI_API_KEY" ]; then
TTS_BACKEND="openai"
else
TTS_BACKEND="piper"
fi
Voice selection
- Piper: uses
en_US-lessac-mediumfor [HOST] / [A];en_GB-alan-mediumfor [B]. Override with--voice <model>. - OpenAI:
alloyfor HOST/A,onyxfor B. - ElevenLabs: premade voices only — free-tier API blocks library voices (Rachel/Adam) with HTTP 402. Default:
Alice(voice idXb7hH8MSUJpSbSDYk0k2) for HOST/A,Eric(voice idcjVigY5qzO86Huf0OWal) for B. Other safe premade options:Sarah,Brian,Bill. Override withELEVENLABS_VOICE_A/ELEVENLABS_VOICE_Benv vars (pass voice IDs, not names).
Step 6: Render Each Line
Walk the script, split by [HOST] / [A] / [B] tags. For each line:
# Piper example
echo "$LINE_TEXT" | piper \
--model "$VOICE_MODEL" \
--output_file "/tmp/podcast_${i}.wav"
Replace {{cite: path}} with the page's title (or filename stem) before TTS — the listener hears "as attention explains", not the raw path.
Short 250ms silence between lines. Longer 600ms silence when speaker changes in two-voice mode.
Step 7: Concatenate with ffmpeg
# build a concat list
for w in /tmp/podcast_*.wav; do echo "file '$w'" >> /tmp/podcast_list.txt; done
# render MP3
ffmpeg -f concat -safe 0 -i /tmp/podcast_list.txt \
-codec:a libmp3lame -qscale:a 2 \
"$VAULT_DIR/artifacts/podcast/<slug>-<date>.mp3"
VBR q2 is the right quality for voice — bigger files aren't audibly better, smaller noticeably worse.
Step 8: Version Detection
Before writing the sidecar, check for an existing artifact of the same type and topic:
ARTIFACT_TYPE="podcast"
EXISTING=$(ls "$VAULT_DIR/artifacts/$ARTIFACT_TYPE/"*"$TOPIC_SLUG"*.meta.yaml 2>/dev/null | sort | tail -1)
if [ -n "$EXISTING" ]; then
PREV_VERSION=$(grep '^version:' "$EXISTING" | awk '{print $2}')
PREV_VERSION=${PREV_VERSION:-1}
VERSION=$((PREV_VERSION + 1))
PREV_SLUG=$(basename "$EXISTING" .meta.yaml)
else
VERSION=1
PREV_SLUG=""
fi
The old artifact stays in place — not deleted, not overwritten. Multiple files of the same type + topic = version history. The portal discovers and displays these automatically.
Small fixes (CSS tweaks, typo corrections) should update the file in-place without incrementing the version — use judgement based on whether the content meaningfully changed.
Step 9: Write the Sidecar
META="${MP3_OUT%.mp3}.meta.yaml"
cat > "$META" <<EOF
generator: generate-podcast@0.1.0
generated-at: $(date -u +"%Y-%m-%dT%H:%M:%SZ")
tts-backend: $TTS_BACKEND
voice: $VOICE_MODEL
format: $([ -n "$TWO_VOICE" ] && echo "two-voice" || echo "single-host")
length-target: $LENGTH
topic: "<raw topic argument>"
generated-from:
$(for p in "${PAGES[@]}"; do echo " - $p"; done)
source-hash: $HASH
version: $VERSION
change-note: "<brief description of what changed, or 'Initial version' for v1>"
replaces: "$PREV_SLUG"
EOF
Step 10: Commit to Vault Repo
cd "$VAULT_DIR"
git add "artifacts/podcast/<slug>-<date>."{mp3,script.md,meta.yaml} 2>/dev/null
git diff --cached --quiet || git commit -m "🎙 podcast: generate <topic> ($(date +%Y-%m-%d))"
Step 11: Report to User
✅ Podcast generated
Topic: <topic>
Format: <single-host|two-voice>
TTS: <piper|openai|elevenlabs>
Length: <short|medium|long> (~<N> min)
Pages in: <N>
Source hash: <first 12 chars>
Script: vaults/<vault>/artifacts/podcast/<slug>-<date>.script.md
MP3: vaults/<vault>/artifacts/podcast/<slug>-<date>.mp3
Sidecar: vaults/<vault>/artifacts/podcast/<slug>-<date>.meta.yaml
Listen: open <absolute path>
Known Limitations (Phase 2C)
- Piper voices sound robotic. Great for draft listens; less great for sharing. Users with API keys get automatic upgrade to OpenAI / ElevenLabs.
- No music / intro stingers. Pure voice. Phase 2C scope.
- No chaptering. ID3 chapters would be nice. Deferred.
- Cost warning — for long podcasts with ElevenLabs, print the projected cost before rendering and ask for confirmation.
See Also
.claude/skills/generate/SKILL.md— router that dispatches here..claude/skills/generate/lib/select-pages.sh— shared topic resolution..claude/skills/generate-video/SKILL.md— chains this handler for voiceover.sites/docs/src/content/docs/reference/artifacts.md— sidecar schema.