podcast-generation

name: podcast-generation description: "Generate audio podcasts from knowledge content using text-to-speech. Supports edge-tts (free) and ElevenLabs (premium) for conversational podcast creation." allowed-tools: read, write, bash, task

Podcast Generation Skill

Purpose

Transform research content into engaging audio podcasts with conversational dialogue between a host and expert. Supports free (edge-tts) and premium (ElevenLabs) text-to-speech options.

When to Activate

User requests podcast generation from content
NotebookLM-style audio output needed
Converting research to audio format
Creating educational audio content

Key Triggers: podcast, audio, tts, text-to-speech, spoken, listen

Prerequisites

Free Option: edge-tts

pip install edge-tts

Premium Option: ElevenLabs

pip install elevenlabs

# Set API key
export ELEVENLABS_API_KEY="your-api-key"

Podcast Script Structure

Script Template

---
title: "[Topic] - Deep Dive"
duration_target: "15-20 minutes"
style: "conversational, educational"
hosts:
  - name: "Alex"
    role: "Host"
    voice_id: "en-US-GuyNeural"  # edge-tts
    personality: "Curious, engaging, asks clarifying questions"
  - name: "Dr. Taylor"
    role: "Expert"
    voice_id: "en-US-JennyNeural"  # edge-tts
    personality: "Knowledgeable, explains clearly, uses analogies"
---

# Podcast Script: [Topic]

## SEGMENT 1: Introduction (30-60 seconds)

[HOST]: Welcome to Research Insights, the podcast where we break down complex topics into digestible conversations. I'm Alex, and today we're diving into [TOPIC]. Joining me is Dr. Taylor, an expert in [FIELD]. Dr. Taylor, thanks for being here!

[EXPERT]: Thanks for having me, Alex! This is such a fascinating topic, and I'm excited to unpack it for our listeners.

[HOST]: So let's start with the basics. For someone who's never heard of [TOPIC], what exactly are we talking about?

## SEGMENT 2: Core Concepts (4-5 minutes)

[EXPERT]: Great question! Think of [TOPIC] like this... [ANALOGY]

[HOST]: Oh, that's a helpful way to think about it. So when you say [TERM], what does that mean in practice?

[EXPERT]: Right, so [EXPLANATION]. The key thing to understand is [KEY POINT].

[HOST]: And why should our listeners care about this? What's the real-world impact?

[EXPERT]: That's the exciting part! [IMPACT/APPLICATION]

## SEGMENT 3: Deep Dive (6-8 minutes)

[HOST]: Let's dig deeper into [SPECIFIC ASPECT]. I've heard that [COMMON QUESTION/MISCONCEPTION]?

[EXPERT]: Yes, that's a common question! The reality is [CLARIFICATION]. What's really happening is [DETAILED EXPLANATION].

[HOST]: Interesting! And how does this relate to [RELATED CONCEPT]?

[EXPERT]: They're actually closely connected. [CONNECTION EXPLANATION]

## SEGMENT 4: Practical Applications (3-4 minutes)

[HOST]: So if someone wanted to apply this knowledge, what would that look like?

[EXPERT]: There are several practical applications. First, [APPLICATION 1]. Second, [APPLICATION 2]. And increasingly, we're seeing [APPLICATION 3].

[HOST]: Are there any common pitfalls or mistakes people should avoid?

[EXPERT]: Definitely. The biggest one is [PITFALL]. Instead, you should [RECOMMENDATION].

## SEGMENT 5: Future & Closing (2-3 minutes)

[HOST]: Where do you see this field heading? What's on the horizon?

[EXPERT]: We're seeing exciting developments in [FUTURE TREND]. I think in the next few years, we'll see [PREDICTION].

[HOST]: That's fascinating! Before we wrap up, what's the one thing you want our listeners to remember from today?

[EXPERT]: If they take away just one thing, it's this: [KEY TAKEAWAY].

[HOST]: Brilliant! Dr. Taylor, thank you so much for joining us today.

[EXPERT]: My pleasure, Alex. This was fun!

[HOST]: And thank you to our listeners! Don't forget to check out our show notes for links and resources. Until next time, keep learning!

## END

Audio Generation

Option 1: edge-tts (Free)

import edge_tts
import asyncio
from pathlib import Path
from pydub import AudioSegment
import re

# Voice options for edge-tts
VOICES = {
    'host_male': 'en-US-GuyNeural',
    'host_female': 'en-US-JennyNeural',
    'expert_male': 'en-US-ChristopherNeural',
    'expert_female': 'en-US-AriaNeural',
    'narrator': 'en-US-DavisNeural'
}

async def text_to_speech(text: str, voice: str, output_path: str):
    """Convert text to speech using edge-tts."""
    communicate = edge_tts.Communicate(text, voice)
    await communicate.save(output_path)

def parse_script(script_path: str) -> list:
    """Parse podcast script into segments."""
    with open(script_path, 'r') as f:
        content = f.read()

    segments = []
    pattern = r'\[(HOST|EXPERT)\]:\s*(.+?)(?=\n\[|$)'
    matches = re.findall(pattern, content, re.DOTALL)

    for speaker, text in matches:
        segments.append({
            'speaker': speaker,
            'text': text.strip(),
            'voice': VOICES['host_male'] if speaker == 'HOST' else VOICES['expert_female']
        })

    return segments

async def generate_podcast(script_path: str, output_path: str):
    """Generate complete podcast from script."""
    segments = parse_script(script_path)
    audio_segments = []
    temp_dir = Path("temp_audio")
    temp_dir.mkdir(exist_ok=True)

    for i, segment in enumerate(segments):
        temp_file = temp_dir / f"segment_{i:03d}.mp3"

        await text_to_speech(
            segment['text'],
            segment['voice'],
            str(temp_file)
        )

        audio = AudioSegment.from_mp3(str(temp_file))
        audio_segments.append(audio)

        # Add pause between speakers
        audio_segments.append(AudioSegment.silent(duration=500))

    # Combine all segments
    final_audio = sum(audio_segments)

    # Export
    final_audio.export(output_path, format="mp3", bitrate="192k")

    # Cleanup
    for f in temp_dir.glob("*.mp3"):
        f.unlink()
    temp_dir.rmdir()

    return output_path

# Usage
if __name__ == "__main__":
    asyncio.run(generate_podcast(
        "output/podcasts/topic_script.md",
        "output/podcasts/topic_podcast.mp3"
    ))

Option 2: ElevenLabs (Premium)

from elevenlabs import generate, save, set_api_key, voices
from pathlib import Path
import os

set_api_key(os.getenv("ELEVENLABS_API_KEY"))

# ElevenLabs voice IDs
ELEVENLABS_VOICES = {
    'host': 'pNInz6obpgDQGcFmaJgB',  # Adam
    'expert': 'EXAVITQu4vr4xnSDxMaL',  # Sarah
}

def generate_segment(text: str, voice_id: str, output_path: str):
    """Generate audio for a single segment."""
    audio = generate(
        text=text,
        voice=voice_id,
        model="eleven_monolingual_v1"
    )
    save(audio, output_path)

def generate_podcast_elevenlabs(script_path: str, output_path: str):
    """Generate podcast using ElevenLabs."""
    segments = parse_script(script_path)
    audio_files = []
    temp_dir = Path("temp_audio")
    temp_dir.mkdir(exist_ok=True)

    for i, segment in enumerate(segments):
        temp_file = temp_dir / f"segment_{i:03d}.mp3"
        voice_id = (ELEVENLABS_VOICES['host']
                   if segment['speaker'] == 'HOST'
                   else ELEVENLABS_VOICES['expert'])

        generate_segment(segment['text'], voice_id, str(temp_file))
        audio_files.append(temp_file)

    # Combine using pydub
    combined = AudioSegment.empty()
    for f in audio_files:
        audio = AudioSegment.from_mp3(str(f))
        combined += audio
        combined += AudioSegment.silent(duration=500)

    combined.export(output_path, format="mp3", bitrate="320k")

    # Cleanup
    for f in audio_files:
        f.unlink()
    temp_dir.rmdir()

Script Generation from Content

Content to Script Conversion

def generate_podcast_script(
    content: str,
    topic: str,
    target_duration: int = 15  # minutes
) -> str:
    """
    Generate a podcast script from source content.

    Uses the content to create a conversational dialogue
    between host and expert.
    """
    script = f"""---
title: "{topic} - Research Insights"
duration_target: "{target_duration} minutes"
generation_date: "{datetime.now().isoformat()}"
source_content: "knowledge_base"
---

# Podcast Script: {topic}

## SEGMENT 1: Introduction (45 seconds)

[HOST]: Welcome back to Research Insights! I'm your host, and today we're exploring {topic}. This is a fascinating area that's really transforming how we think about [DOMAIN]. I'm joined by our resident expert to break it all down. Welcome!

[EXPERT]: Thanks! I'm really excited to discuss this. {topic} is one of those topics that sounds complex but becomes incredibly intuitive once you understand the core principles.

[HOST]: Perfect! Let's start from the beginning. What exactly is {topic}?

## SEGMENT 2: Core Explanation (5 minutes)

[EXPERT]: At its heart, {topic} is about...

[Continue generating based on content...]
"""

    # Use LLM to expand script based on content
    # This would involve calling the research agents to structure the content

    return script

Output Structure

Generated Files

output/podcasts/[topic]/
├── [topic]_script.md           # Full podcast script
├── [topic]_script_timed.md     # Script with timestamps
├── [topic]_podcast.mp3         # Final audio file
├── [topic]_podcast_hq.mp3      # High quality version (ElevenLabs)
├── segments/                    # Individual segments
│   ├── 001_intro.mp3
│   ├── 002_core_concepts.mp3
│   └── ...
└── metadata.json               # Podcast metadata

Metadata Format

{
  "title": "Topic - Research Insights",
  "duration_seconds": 912,
  "duration_formatted": "15:12",
  "generation_date": "2024-01-15T10:30:00Z",
  "tts_provider": "edge-tts",
  "voices": {
    "host": "en-US-GuyNeural",
    "expert": "en-US-JennyNeural"
  },
  "source_content": "docs/topic/knowledge_base.md",
  "segments": [
    {"name": "Introduction", "start": 0, "end": 45},
    {"name": "Core Concepts", "start": 45, "end": 345}
  ]
}

Voice Selection Guide

edge-tts Voices (Free)

Voice ID	Gender	Style	Best For
en-US-GuyNeural	Male	Conversational	Host
en-US-ChristopherNeural	Male	Professional	Expert
en-US-JennyNeural	Female	Friendly	Host/Expert
en-US-AriaNeural	Female	Expressive	Narrator
en-US-DavisNeural	Male	Authoritative	Narrator

ElevenLabs Voices (Premium)

Voice	Style	Best For
Adam	Natural, warm	Host
Sarah	Clear, professional	Expert
Antoni	Engaging	Narrator
Domi	Conversational	Guest

Best Practices

Script Quality: Well-written scripts produce better audio
Pacing: Include natural pauses and transitions
Pronunciation: Spell out technical terms phonetically if needed
Length: 15-20 minutes is optimal for engagement
Testing: Preview with short segments before full generation

Integration

Command Usage

/notebooklm-generate-podcast [topic_or_path]

Workflow Integration

Part of the /notebooklm-generate-all workflow that creates all artifact types from ingested sources.