ai-voice-memo - SKILL.md Agent Skill

name: ai-voice-memo description: Transcribe voice memos, summarize key points, and extract action items. metadata: { "openclaw": { "emoji": "🎙️", "requires": { "bins": ["curl"], "env": ["OPENAI_API_KEY"] }, "primaryEnv": "OPENAI_API_KEY", },

}

AI Voice Memo

Receive a voice note or audio file, transcribe it, summarize key points, and extract action items.

Workflow

Receive audio — voice message (Telegram/Discord) or uploaded file
Transcribe — send to OpenAI Whisper API via openai-whisper-api skill
Analyze — summarize and extract action items using the LLM
Respond — formatted output; optionally a TTS audio summary

Supported Formats

mp3, m4a, wav, ogg, webm (anything Whisper accepts)

Usage

Step 1 — Transcribe

Use the openai-whisper-api skill to get a transcript:

{baseDir}/../../../opt/homebrew/lib/node_modules/openclaw/skills/openai-whisper-api/scripts/transcribe.sh /path/to/audio.m4a --out /tmp/transcript.txt

Or use the bundled helper that does transcribe + analyze in one shot:

{baseDir}/scripts/process-memo.sh /path/to/audio.m4a

This outputs a JSON file with summary, action_items, and transcript.

Step 2 — Analyze the Transcript

Feed the transcript to the LLM with this prompt structure:

Analyze this voice memo transcript. Provide:

## Summary
A concise summary of the key points (2-5 bullet points).

## Action Items
Extract any action items, tasks, or todos mentioned. Format as a checklist:
- [ ] Action item 1
- [ ] Action item 2

If no action items are found, say "No action items identified."

## Full Transcript
<paste transcript>

Step 3 — Optional TTS Response

For an audio summary, use ElevenLabs TTS (voice: Charlie, ID: IKne3meq5aSn9XLyUdCD):

Use TTS to read back the summary and action items.

Output Format

🎙️ Voice Memo Summary

## Summary
• Key point 1
• Key point 2
• Key point 3

## Action Items
- [ ] Task 1
- [ ] Task 2

## Full Transcript
<transcript text>

Example Agent Flow

When a user sends a voice message:

Download the audio file to a temp path
Run: openai-whisper-api/scripts/transcribe.sh <audio> --out /tmp/memo-transcript.txt
Read the transcript
Analyze with the LLM using the prompt above
Reply with the formatted summary
If user requested audio: generate TTS of the summary

Tips

For long memos (>5 min), the summary becomes more valuable
Action items work best when the speaker is explicit ("I need to...", "remind me to...")
The --language flag on transcribe helps with non-English memos
Works great as an automatic handler for all voice messages in a chat