whispercpp-transcribe - SKILL.md Agent Skill

name: whispercpp-transcribe description: Transcribe local audio/video files offline using whisper.cpp (the C++ port of OpenAI Whisper), generating plain text, timestamped, SRT, and JSON outputs. Use when the user wants fast native-speed transcription with GGML quantized models, or prefers whisper.cpp over Python-based alternatives like faster-whisper. Triggers on mentions of whisper.cpp, whisper-cli, GGML models, or requests for high-performance local transcription.

whisper.cpp Transcribe

Use this skill for local-only transcription with whisper.cpp (whisper-cli).

The key advantage over Python-based whisper (faster-whisper) is raw speed: whisper.cpp runs optimized C++ inference with optional GPU acceleration, quantized GGML models, and minimal memory footprint.

Quick start

python3 scripts/transcribe_whispercpp.py "path/to/audio.mp4" \
  --model-path ~/models/ggml-small.bin \
  --output-dir ./output/transcribe-whispercpp

Workflow

Ensure whisper-cli is installed and a GGML model is downloaded.
Run the bundled script on one or more local media files.
Read .transcript.txt for plain text and .transcript.timed.txt for timestamps.
If quality is low, use a larger model (ggml-medium.bin or ggml-large-v3-q5_0.bin).

Commands

Single file:

python3 scripts/transcribe_whispercpp.py "./input/video.mp4" \
  --model-path ~/models/ggml-small.bin \
  --language pt \
  --output-dir ./output/transcribe-whispercpp

Multiple files:

python3 scripts/transcribe_whispercpp.py "./a.mp3" "./b.wav" \
  --model-path ~/models/ggml-small.bin \
  --output-dir ./output/transcribe-whispercpp

Force WAV conversion (useful for formats whisper-cli struggles with):

python3 scripts/transcribe_whispercpp.py "./input/video.mp4" \
  --model-path ~/models/ggml-small.bin \
  --force-wav \
  --output-dir ./output/transcribe-whispercpp

Outputs

For each input file <name>:

<name>.transcript.txt — plain text transcript
<name>.transcript.timed.txt — [start --> end] text format
<name>.transcript.json — structured JSON with segments
<name>.srt — SRT subtitle file (generated by whisper-cli)

Model download

Download GGML models from Hugging Face:

# Small model (~500MB, good balance of speed and quality)
curl -L -o ~/models/ggml-small.bin \
  'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin'

# Large v3 quantized (~1GB, best quality with reasonable size)
curl -L -o ~/models/ggml-large-v3-q5_0.bin \
  'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-q5_0.bin'

Or use the bundled download script from whisper.cpp:

sh ./models/download-ggml-model.sh small

Dependencies

Install whisper-cli (one of):

# macOS via Homebrew
brew install whisper-cpp

# pip (cross-platform, no GPU accel)
pip install whisper.cpp-cli

# Or build from source
git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp && cmake -B build && cmake --build build -j --config Release

Required:

ffmpeg — for audio conversion to 16kHz WAV when needed.

Notes

This flow is local-only and does not use any API key.
whisper-cli natively supports flac, mp3, ogg, and wav. For other formats the script auto-converts via ffmpeg.
Use --threads N to control CPU thread count (default: 4).
The script generates SRT output via whisper-cli's -osrt flag and then parses stdout for the timed text output.