whisper - SKILL.md Agent Skill

name: whisper description: Transcribe audio or video files to text (speech-to-text).

Whisper - Local Audio Transcription

Transcribe audio/video files locally using whisper.cpp. No API calls, no data leaves the machine.

Setup: See SETUP.md

Usage

# Basic transcription (any audio/video format - ffmpeg converts automatically)
~/agent/skills/whisper/scripts/whisper_transcribe.sh recording.mp3
~/agent/skills/whisper/scripts/whisper_transcribe.sh meeting.m4a
~/agent/skills/whisper/scripts/whisper_transcribe.sh video.mp4

# With options
~/agent/skills/whisper/scripts/whisper_transcribe.sh audio.wav --language es
~/agent/skills/whisper/scripts/whisper_transcribe.sh audio.mp3 --translate
~/agent/skills/whisper/scripts/whisper_transcribe.sh audio.mp3 --srt
~/agent/skills/whisper/scripts/whisper_transcribe.sh audio.mp3 --json
~/agent/skills/whisper/scripts/whisper_transcribe.sh audio.mp3 --model /usr/local/share/ggml-medium.en.bin
~/agent/skills/whisper/scripts/whisper_transcribe.sh audio.mp3 --threads 8

Options

Flag	Description
`--language <code>`	Language code (en, es, fr, de, etc.). Default: en
`--translate`	Translate non-English audio to English text
`--srt`	Output SRT subtitle format
`--json`	Output JSON with timestamps
`--model <path>`	Use a different model file
`--threads <n>`	CPU threads (default: 4)

Notes

Accepts any format ffmpeg can read: mp3, m4a, wav, ogg, flac, mp4, webm, etc.
Runs entirely local, no network, no API keys
small.en processes ~15-30x faster than real-time on ARM64
For long recordings (1h+), expect a few minutes of processing
Output goes to stdout. Pipe or redirect as needed