name: add-voice-transcription description: Add voice message transcription to MatClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.
Add Voice Transcription
This skill adds automatic voice message transcription to MatClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as [Voice: <transcript>].
Phase 1: Pre-flight
Check if already applied
Read .matclaw/state.yaml. If voice-transcription is in applied_skills, skip to Phase 3 (Configure). The code changes are already in place.
Ask the user
Use AskUserQuestion to collect information:
AskUserQuestion: Do you have an OpenAI API key for Whisper transcription?
If yes, collect it now. If no, direct them to create one at https://platform.openai.com/api-keys.
Phase 2: Apply Code Changes
Run the skills engine to apply this skill's code package.
Initialize skills system (if needed)
If .matclaw/ directory doesn't exist yet:
npx tsx scripts/apply-skill.ts --init
Apply the skill
npx tsx scripts/apply-skill.ts .claude/skills/add-voice-transcription
This deterministically:
- Adds
src/transcription.ts(voice transcription module using OpenAI Whisper) - Three-way merges voice handling into
src/channels/whatsapp.ts(isVoiceMessage check, transcribeAudioMessage call) - Three-way merges transcription tests into
src/channels/whatsapp.test.ts(mock + 3 test cases) - Installs the
openainpm dependency - Updates
.env.examplewithOPENAI_API_KEY - Records the application in
.matclaw/state.yaml
If the apply reports merge conflicts, read the intent files:
modify/src/channels/whatsapp.ts.intent.md— what changed and invariants for whatsapp.tsmodify/src/channels/whatsapp.test.ts.intent.md— what changed for whatsapp.test.ts
Validate code changes
npm test
npm run build
All tests must pass (including the 3 new voice transcription tests) and build must be clean before proceeding.
Phase 3: Configure
Get OpenAI API key (if needed)
If the user doesn't have an API key:
I need you to create an OpenAI API key:
- Go to https://platform.openai.com/api-keys
- Click "Create new secret key"
- Give it a name (e.g., "MatClaw Transcription")
- Copy the key (starts with
sk-)Cost:
$0.006 per minute of audio ($0.003 per typical 30-second voice note)
Wait for the user to provide the key.
Add to environment
Add to .env:
OPENAI_API_KEY=<their-key>
Sync to container environment:
mkdir -p data/env && cp .env data/env/env
The container reads environment from data/env/env, not .env directly.
Build and restart
npm run build
launchctl kickstart -k gui/$(id -u)/com.matclaw # macOS
# Linux: systemctl --user restart matclaw
Phase 4: Verify
Test with a voice note
Tell the user:
Send a voice note in any registered WhatsApp chat. The agent should receive it as
[Voice: <transcript>]and respond to its content.
Check logs if needed
tail -f logs/matclaw.log | grep -i voice
Look for:
Transcribed voice message— successful transcription with character countOPENAI_API_KEY not set— key missing from.envOpenAI transcription failed— API error (check key validity, billing)Failed to download audio message— media download issue
Troubleshooting
Voice notes show "[Voice Message - transcription unavailable]"
- Check
OPENAI_API_KEYis set in.envAND synced todata/env/env - Verify key works:
curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200 - Check OpenAI billing — Whisper requires a funded account
Voice notes show "[Voice Message - transcription failed]"
Check logs for the specific error. Common causes:
- Network timeout — transient, will work on next message
- Invalid API key — regenerate at https://platform.openai.com/api-keys
- Rate limiting — wait and retry
Agent doesn't respond to voice notes
Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups.