name: ai-voice-memo description: Transcribe voice memos, summarize key points, and extract action items. metadata: { "openclaw": { "emoji": "๐๏ธ", "requires": { "bins": ["curl"], "env": ["OPENAI_API_KEY"] }, "primaryEnv": "OPENAI_API_KEY", },
}
AI Voice Memo
Receive a voice note or audio file, transcribe it, summarize key points, and extract action items.
Workflow
- Receive audio โ voice message (Telegram/Discord) or uploaded file
- Transcribe โ send to OpenAI Whisper API via
openai-whisper-apiskill - Analyze โ summarize and extract action items using the LLM
- Respond โ formatted output; optionally a TTS audio summary
Supported Formats
mp3, m4a, wav, ogg, webm (anything Whisper accepts)
Usage
Step 1 โ Transcribe
Use the openai-whisper-api skill to get a transcript:
{baseDir}/../../../opt/homebrew/lib/node_modules/openclaw/skills/openai-whisper-api/scripts/transcribe.sh /path/to/audio.m4a --out /tmp/transcript.txt
Or use the bundled helper that does transcribe + analyze in one shot:
{baseDir}/scripts/process-memo.sh /path/to/audio.m4a
This outputs a JSON file with summary, action_items, and transcript.
Step 2 โ Analyze the Transcript
Feed the transcript to the LLM with this prompt structure:
Analyze this voice memo transcript. Provide:
## Summary
A concise summary of the key points (2-5 bullet points).
## Action Items
Extract any action items, tasks, or todos mentioned. Format as a checklist:
- [ ] Action item 1
- [ ] Action item 2
If no action items are found, say "No action items identified."
## Full Transcript
<paste transcript>
Step 3 โ Optional TTS Response
For an audio summary, use ElevenLabs TTS (voice: Charlie, ID: IKne3meq5aSn9XLyUdCD):
Use TTS to read back the summary and action items.
Output Format
๐๏ธ Voice Memo Summary
## Summary
โข Key point 1
โข Key point 2
โข Key point 3
## Action Items
- [ ] Task 1
- [ ] Task 2
## Full Transcript
<transcript text>
Example Agent Flow
When a user sends a voice message:
- Download the audio file to a temp path
- Run:
openai-whisper-api/scripts/transcribe.sh <audio> --out /tmp/memo-transcript.txt - Read the transcript
- Analyze with the LLM using the prompt above
- Reply with the formatted summary
- If user requested audio: generate TTS of the summary
Tips
- For long memos (>5 min), the summary becomes more valuable
- Action items work best when the speaker is explicit ("I need to...", "remind me to...")
- The
--languageflag on transcribe helps with non-English memos - Works great as an automatic handler for all voice messages in a chat