generate-podcast

name: generate-podcast description: Render a spoken-word MP3 podcast from wiki pages — single-host by default or two-voice dialogue. Piper TTS default (local, free); falls back to ElevenLabs / OpenAI TTS when their API keys are present. Used by /generate podcast. Not user-invocable directly — go through /generate. user-invocable: false allowed-tools: Bash(which ) Bash(brew ) Bash(git ) Bash(mkdir ) Bash(date ) Bash(cat ) Bash(sed ) Bash(grep ) Bash(awk ) Bash(ffmpeg ) Bash(piper ) Bash(curl ) Bash(npm ) Bash(pnpm ) Bash(npx *) Read Write Glob Grep content-pipeline: - pipeline:audio - platform:agnostic - role:adapter

Generate Podcast

Produce a 3–10 minute MP3 explainer from wiki pages. The LLM writes a spoken-word narrative, TTS renders each line, ffmpeg concatenates into a single MP3.

Artifact-first — output lands in vaults/<vault>/artifacts/podcast/.

Usage (via /generate router)

/generate podcast <topic> [--vault <name>] [--length short|medium|long] [--two-voice] [--voice <name>]

--length — short (~~3 min), medium (~~6 min, default), long (~10 min).
--two-voice — dialogue between two hosts instead of a monologue.
--voice — override the default Piper voice. Ignored when the ElevenLabs / OpenAI fallback kicks in.

Same topic resolution as sibling handlers — reuses .claude/skills/generate/lib/select-pages.sh.

Pipeline

wiki pages  →  LLM script writer  →  script.md  →  TTS per line  →  ffmpeg concat  →  podcast.mp3

Keep the .script.md alongside the MP3 — it's diffable, re-renderable, and the honest primary artifact.

Step 1: Dependency Check

HAS_FFMPEG=0; HAS_PIPER=0
which ffmpeg >/dev/null 2>&1 && HAS_FFMPEG=1
which piper  >/dev/null 2>&1 && HAS_PIPER=1

if [ "$HAS_FFMPEG" = "0" ]; then
  echo "ffmpeg missing. Installing via Homebrew…"
  brew install ffmpeg
fi

# Piper is optional if ELEVENLABS_API_KEY or OPENAI_API_KEY is set.
if [ "$HAS_PIPER" = "0" ] && [ -z "$ELEVENLABS_API_KEY" ] && [ -z "$OPENAI_API_KEY" ]; then
  echo "Piper not found and no cloud TTS key present."
  echo "Installing Piper (local, free, robotic-but-serviceable)…"
  brew install piper-tts 2>/dev/null || {
    echo "Homebrew install failed. See https://github.com/rhasspy/piper for manual install."
    exit 1
  }
fi

Step 2: Resolve Vault + Topic

mapfile -t PAGES < <(.claude/skills/generate/lib/select-pages.sh "$VAULT_DIR" "$TOPIC")

Exit 1 from the helper = no pages matched; surface verbatim.

Step 3: Compute Source Hash

HASH=$(.claude/skills/generate/lib/source-hash.sh "${PAGES[@]}")

Step 4: Write the Script

The invoking LLM reads the selected pages and writes a narrative script.md. Two shapes supported:

Single-host monologue (default)

# Podcast: {{topic}}

_Length target: {{length}} (~{{minutes}} min)._

[HOST]: Welcome. Today we're talking about {{topic}}. Here's why that matters…
[HOST]: First, the basics. According to {{cite: wiki/concepts/attention.md}}, attention is…
[HOST]: …

Two-voice dialogue (`--two-voice`)

# Podcast: {{topic}}

[A]: Alright, let's get into {{topic}}.
[B]: Why this, why now?
[A]: Because {{cite: wiki/concepts/rag.md}}…
[B]: Huh. I thought…
[A]: Right, but here's the nuance…

Script-writing rules the LLM follows:

Spoken-word, not read-aloud-bullets. Full sentences with natural cadence.
Cite wiki pages inline with {{cite: path}} — preprocessed to *pagename* before TTS sees them.
Length target: ~150 words per minute. 3 min → ~450 words; 6 min → ~900; 10 min → ~1500.
No "as we mentioned earlier" crutch unless the script actually mentioned it.
End with a short sources spoken-list — TTS handles it fine.

Templates live at .claude/skills/generate-podcast/templates/{single-host,two-voice}.md and give the LLM a starting shape.

Step 5: TTS Backend Selection

Priority order:

Priority	Backend	Trigger	Cost	Quality
1	ElevenLabs	`ELEVENLABS_API_KEY` set	~$0.30 per 1k chars	Studio-grade
2	OpenAI TTS	`OPENAI_API_KEY` set	~$0.015 per 1k chars	Very good
3	Piper (local)	always available once installed	free	Robotic but clean

if [ -n "$ELEVENLABS_API_KEY" ]; then
  TTS_BACKEND="elevenlabs"
elif [ -n "$OPENAI_API_KEY" ]; then
  TTS_BACKEND="openai"
else
  TTS_BACKEND="piper"
fi

Voice selection

Piper: uses en_US-lessac-medium for [HOST] / [A]; en_GB-alan-medium for [B]. Override with --voice <model>.
OpenAI: alloy for HOST/A, onyx for B.
ElevenLabs: premade voices only — free-tier API blocks library voices (Rachel/Adam) with HTTP 402. Default: Alice (voice id Xb7hH8MSUJpSbSDYk0k2) for HOST/A, Eric (voice id cjVigY5qzO86Huf0OWal) for B. Other safe premade options: Sarah, Brian, Bill. Override with ELEVENLABS_VOICE_A / ELEVENLABS_VOICE_B env vars (pass voice IDs, not names).

Step 6: Render Each Line

Walk the script, split by [HOST] / [A] / [B] tags. For each line:

# Piper example
echo "$LINE_TEXT" | piper \
  --model "$VOICE_MODEL" \
  --output_file "/tmp/podcast_${i}.wav"

Replace {{cite: path}} with the page's title (or filename stem) before TTS — the listener hears "as attention explains", not the raw path.

Short 250ms silence between lines. Longer 600ms silence when speaker changes in two-voice mode.

Step 7: Concatenate with ffmpeg

# build a concat list
for w in /tmp/podcast_*.wav; do echo "file '$w'" >> /tmp/podcast_list.txt; done

# render MP3
ffmpeg -f concat -safe 0 -i /tmp/podcast_list.txt \
  -codec:a libmp3lame -qscale:a 2 \
  "$VAULT_DIR/artifacts/podcast/<slug>-<date>.mp3"

VBR q2 is the right quality for voice — bigger files aren't audibly better, smaller noticeably worse.

Step 8: Version Detection

Before writing the sidecar, check for an existing artifact of the same type and topic:

ARTIFACT_TYPE="podcast"
EXISTING=$(ls "$VAULT_DIR/artifacts/$ARTIFACT_TYPE/"*"$TOPIC_SLUG"*.meta.yaml 2>/dev/null | sort | tail -1)
if [ -n "$EXISTING" ]; then
  PREV_VERSION=$(grep '^version:' "$EXISTING" | awk '{print $2}')
  PREV_VERSION=${PREV_VERSION:-1}
  VERSION=$((PREV_VERSION + 1))
  PREV_SLUG=$(basename "$EXISTING" .meta.yaml)
else
  VERSION=1
  PREV_SLUG=""
fi

The old artifact stays in place — not deleted, not overwritten. Multiple files of the same type + topic = version history. The portal discovers and displays these automatically.

Small fixes (CSS tweaks, typo corrections) should update the file in-place without incrementing the version — use judgement based on whether the content meaningfully changed.

Step 9: Write the Sidecar

META="${MP3_OUT%.mp3}.meta.yaml"
cat > "$META" <<EOF
generator: generate-podcast@0.1.0
generated-at: $(date -u +"%Y-%m-%dT%H:%M:%SZ")
tts-backend: $TTS_BACKEND
voice: $VOICE_MODEL
format: $([ -n "$TWO_VOICE" ] && echo "two-voice" || echo "single-host")
length-target: $LENGTH
topic: "<raw topic argument>"
generated-from:
$(for p in "${PAGES[@]}"; do echo "  - $p"; done)
source-hash: $HASH
version: $VERSION
change-note: "<brief description of what changed, or 'Initial version' for v1>"
replaces: "$PREV_SLUG"
EOF

Step 10: Commit to Vault Repo

cd "$VAULT_DIR"
git add "artifacts/podcast/<slug>-<date>."{mp3,script.md,meta.yaml} 2>/dev/null
git diff --cached --quiet || git commit -m "🎙 podcast: generate <topic> ($(date +%Y-%m-%d))"

Step 11: Report to User

✅ Podcast generated
   Topic:       <topic>
   Format:      <single-host|two-voice>
   TTS:         <piper|openai|elevenlabs>
   Length:      <short|medium|long>  (~<N> min)
   Pages in:    <N>
   Source hash: <first 12 chars>
   Script:      vaults/<vault>/artifacts/podcast/<slug>-<date>.script.md
   MP3:         vaults/<vault>/artifacts/podcast/<slug>-<date>.mp3
   Sidecar:     vaults/<vault>/artifacts/podcast/<slug>-<date>.meta.yaml
   Listen:      open <absolute path>

Known Limitations (Phase 2C)

Piper voices sound robotic. Great for draft listens; less great for sharing. Users with API keys get automatic upgrade to OpenAI / ElevenLabs.
No music / intro stingers. Pure voice. Phase 2C scope.
No chaptering. ID3 chapters would be nice. Deferred.
Cost warning — for long podcasts with ElevenLabs, print the projected cost before rendering and ask for confirmation.