podcast-publishing-assistant - SKILL.md Agent Skill

name: podcast-publishing-assistant description: Transcribe long-form audio, YouTube videos, podcasts, interviews, or panels; summarize them; extract chapter markers; and draft publishing assets like titles, YouTube descriptions, show notes, and tweet/X copy. Use when a user shares a YouTube link or audio/video file and asks for transcription, diarization, summaries with timestamps, chapter markers, show notes, titles, descriptions, promo copy, or social posts. Especially use for Whisper-based workflows, source-audio-first podcast processing, and any request to turn a recorded conversation into publishable assets.

Use this skill for podcast or interview publishing workflows.

Prefer source audio over platform captions.
Do not rely on YouTube captions unless the user explicitly asks for the fastest possible rough pass.
For Whisper or API transcripts, be honest about uncertainty. Do not claim true diarization unless the tool actually produced it.
If the user has a formatting preference, preserve it exactly on later turns.
Default chapter output for swyx: YouTube timestamp format only, start times only, followed by a 2 to 10 word headline.

Fetch source audio first.
- For YouTube URLs, use yt-dlp to extract audio.
- Prefer MP3 output for compatibility.
Transcribe with a fallback ladder.
- First try the OpenAI Whisper API when OPENAI_API_KEY is available.
- The OpenAI audio transcription upload cap is about 25 MB. Expect long episodes to exceed it.
- If the file is too large, compress before retrying, for example mono 16 kHz MP3 around 32 kbps.
- If API transcription still fails, fall back to local Whisper.
- If a script exists but is not executable, invoke it with bash rather than failing on permissions.
Keep the user updated during long jobs.
- Brief progress notes are enough: downloading, compressing, transcribing, packaging.
- If a fallback happens, say what failed in one line and continue.
Produce the publishing pack.
- Summary tuned for someone deciding whether to watch or listen.
- Key takeaways or key arguments.
- 5 title ideas when asked.
- 5 description ideas when asked.
- Chapter timestamps.
- Optional: tweet thread, show notes, key quotes, newsletter blurb.
Package timestamps correctly.
- Prefer real timestamps if the transcription source provides segments.
- If the transcript is a single plain-text blob, estimate coarse chapter times conservatively and label them as approximate if needed.
- For swyx, default to this exact style:
  - 0:00 Intro and origin story
  - 3:30 How the brand grew

OpenAI Whisper API often returns plain text unless a segment-rich response format is requested elsewhere.
Plain text transcripts are good for summaries and rough chapters, but weak for precise diarization.
When speaker labels are needed without true diarization, reconstruct heuristically from turn-taking and identify this clearly as approximate.

When a user wants publishing help, default to a reusable launch pack.

YouTube titles
- Provide 3 to 4 options.
- Prefer thesis first, guest second.
- Keep them specific, not generic.
YouTube descriptions
- Provide 3 to 4 options.
- Each should include:
  - one short framing paragraph
  - 3 to 6 concrete bullets on what the audience will learn
  - chapter markers in YouTube timestamp format when available
X main post
- Provide 3 to 4 options.
- Structure:
  - hook or thesis
  - guest credibility line
  - 3 to 6 concrete bullets
  - link placeholder or actual link
X follow-up reply
- Provide 1 concise timestamps-only reply unless the user asks for more.
- Use YouTube timestamp format only, start times only, 2 to 10 word headlines.
Native clip ideas
- Provide 3 to 4 options.
- For each clip idea, include:
  - clip theme
  - why it works as promo
  - rough timestamp window if known
Quote card ideas
- Provide 3 to 4 options.
- Prefer clean, arguable lines with strong standalone meaning.

Default output order:

scripts/transcribe_youtube.sh: download audio from YouTube, shrink oversized audio for Whisper API, and fall back cleanly.

Prefer speed first, but do not fake certainty.
If a transcript path is broken, say what failed and continue with the next fallback.
For long episodes, give the user a progress update before the final packaging pass.
If the user says a format like “always give me YouTube timestamp format”, treat that as a sticky preference for the rest of the task.