transcribe

name: transcribe description: "Transcribe and summarize video or audio content. Use when the user shares a video URL (X/Twitter, direct mp4/webm link), asks to 'transcribe this', 'summarize this video', 'what does this video say', or provides a tweet URL containing a video."

Transcribe video/audio content and produce a structured summary.

Input handling

Determine the video source from the user's input:
- X/Twitter URL: Extract the tweet ID, run go run . read <id> --json from the birdy repo root to get the media[].videoUrl. If multiple video qualities exist, prefer the highest resolution.
- Direct video/audio URL: Use as-is.
- Local file path: Use as-is, skip download.
If the source is an X/Twitter URL and go run . fails (not in birdy repo), fall back to birdy read <id> --json or bird read <id> --json.

Download

Create a temp working directory: mkdir -p /tmp/transcribe-work.
Download the video: curl -L -o /tmp/transcribe-work/video.mp4 "<url>".
If the file is already audio-only (mp3/wav/m4a), skip the extraction step.

Extract audio

Extract audio with ffmpeg:

ffmpeg -y -i /tmp/transcribe-work/video.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 /tmp/transcribe-work/audio.wav

Delete the video file to save disk space: rm /tmp/transcribe-work/video.mp4.

Check if mlx_whisper is importable in Python 3. If not, install it: pip3 install mlx-whisper.

Run transcription with the following Python script:

import mlx_whisper
result = mlx_whisper.transcribe(
    '/tmp/transcribe-work/audio.wav',
    path_or_hf_repo='mlx-community/whisper-small-mlx',
    language='en'
)
with open('/tmp/transcribe-work/transcript.txt', 'w') as f:
    for seg in result['segments']:
        start = int(seg['start'])
        m, s = divmod(start, 60)
        f.write(f'[{m:02d}:{s:02d}] {seg["text"].strip()}\n')

If disk space is tight (the large model fails), fall back to mlx-community/whisper-small-mlx.
For non-English content, omit the language parameter or set it appropriately.

Summarize

Read the full transcript from /tmp/transcribe-work/transcript.txt.
Produce a structured summary with:
- Title and metadata (speakers, host, source)
- Key takeaways — the 4-6 most important points, each with a bold heading and 2-3 sentence explanation
- Notable quotes or claims if any stand out
- Keep the summary concise but substantive. Match the depth to the content length (short video = brief summary, long podcast = detailed breakdown).
Present the summary to the user. Mention that the full timestamped transcript is at /tmp/transcribe-work/transcript.txt.

Cleanup

Delete /tmp/transcribe-work/audio.wav after transcription to free space. Keep transcript.txt.