name: whisper-setup description: Use when the user wants to set up whisper for PTT (push-to-talk) voice input. Guides through choosing API vs local mode and configuring whisper.cpp if local.
Whisper Setup Skill
This skill guides users through setting up Whisper for the PTT plugin.
Overview
The PTT plugin supports two transcription backends:
- OpenAI Whisper API - Cloud-based, easy setup, costs ~$0.006/min
- Local whisper.cpp - Free, private, works offline, requires setup
Setup Flow
Step 1: Ask User Preference
Ask the user which mode they prefer:
Which Whisper mode would you like to set up?
1. **OpenAI API** (Recommended for ease of use)
- Requires OpenAI API key
- Costs ~$0.006 per minute of audio
- Best transcription quality
- Requires internet connection
2. **Local whisper.cpp** (Recommended for privacy)
- Free, no API costs
- Works offline
- Requires ~150MB-3GB disk space (depending on model)
- Transcription speed depends on your hardware
Step 2A: OpenAI API Setup
If user chooses API:
Check if OPENAI_API_KEY environment variable is set:
echo $OPENAI_API_KEY | head -c 10If not set, ask user to provide their API key
Update config:
# Read current config and update cat ~/.claude/ptt-config.jsonSet
whisper.openaiApiKeyto the user's key or instruct them to setOPENAI_API_KEYenv var.Set
whisper.preferredModeto"api"
Step 2B: Local whisper.cpp Setup
If user chooses local:
Check System Resources
Check available RAM:
free -hCheck available disk space:
df -h ~Check CPU info:
lscpu | grep -E "(Model name|CPU\(s\)|Thread)"Check for NVIDIA GPU (for CUDA acceleration):
nvidia-smi 2>/dev/null || echo "No NVIDIA GPU detected"
Recommend Model Based on Resources
Present model options with recommendations based on system:
| Model | Size | RAM Required | Speed | Quality | Best For |
|---|---|---|---|---|---|
| tiny.en | 75MB | ~400MB | Fastest | Basic | Low-resource systems, quick tests |
| base.en | 142MB | ~500MB | Fast | Good | Most desktop systems (RECOMMENDED) |
| small.en | 466MB | ~1GB | Medium | Better | Systems with 8GB+ RAM |
| medium.en | 1.5GB | ~2.5GB | Slow | Great | Systems with 16GB+ RAM |
| large-v3 | 3GB | ~4GB | Slowest | Best | High-end systems, accuracy critical |
Recommendations:
- RAM < 4GB: Use
tiny.en - RAM 4-8GB: Use
base.en(default recommendation) - RAM 8-16GB: Use
small.enfor better quality - RAM > 16GB: Use
medium.enorlarge-v3if accuracy is critical
Install whisper.cpp
Install dependencies:
sudo apt-get update && sudo apt-get install -y build-essential cmakeClone and build:
cd ~ && git clone https://github.com/ggerganov/whisper.cpp.git cd whisper.cpp && make -j$(nproc)Download chosen model:
./models/download-ggml-model.sh <model_name>Replace
<model_name>with:tiny.en,base.en,small.en,medium.en, orlarge-v3Test the installation:
./build/bin/whisper-cli -m models/ggml-<model>.bin -f samples/jfk.wav
Update Config
Update ~/.claude/ptt-config.json:
{
"whisper": {
"localModelPath": "/home/<user>/whisper.cpp/models/ggml-<model>.bin",
"whisperExecutable": "/home/<user>/whisper.cpp/build/bin/whisper-cli",
"preferredMode": "local"
}
}
Step 3: Test Configuration
Verify the setup works:
- For API mode, the MCP server should be able to make API calls
- For local mode, test whisper-cli directly:
# Record a short test arecord -f S16_LE -r 16000 -c 1 -d 3 /tmp/test.wav # Transcribe ~/whisper.cpp/build/bin/whisper-cli -m ~/whisper.cpp/models/ggml-base.en.bin -f /tmp/test.wav
Step 4: Enable Fallback (Optional)
Ask if user wants fallback enabled:
- If both API key and local model are configured, enable
enableFallback: true - This provides resilience - if one method fails, the other is tried
Troubleshooting
Common Issues
"whisper-cli not found"
- Ensure whisper.cpp was built successfully
- Check the executable path in config
"Model file not found"
- Verify the model was downloaded
- Check the model path in config
"API key invalid"
- Verify the API key is correct
- Check for extra whitespace
"Out of memory" during local transcription
- Use a smaller model
- Close other applications
Platform-Specific Notes
macOS
- whisper.cpp builds with CoreML acceleration on Apple Silicon
- Use
makewithout additional flags
Linux with NVIDIA GPU
- Build with CUDA support:
make GGML_CUDA=1
Windows
- Use WSL2 for best compatibility
- Or download pre-built binaries from whisper.cpp releases