name: audio-summary description: Generate conversational audio summaries using text-to-speech. Supports multiple providers (Gemini TTS, OpenAI TTS, ElevenLabs) with preference order and automatic fallback. metadata: openclaw: emoji: ๐ requires: envVars: - GEMINI_API_KEY | OPENAI_API_KEY | ELEVENLABS_API_KEY
Audio Summary Skill
Generates conversational audio summaries using multi-provider TTS.
When to Use
- User says "give me an audio summary"
- User says "audio summary please"
- User says "tell me in voice"
- User explicitly requests audio/voice response
How It Works
- Generate summary text โ Create a conversational summary (150 words max, 30-60 seconds)
- Convert to audio โ Use TTS provider (Gemini โ OpenAI โ ElevenLabs fallback)
- Deliver audio โ Upload via your chat platform or serve locally
Provider Preference Order
| Priority | Provider | Model | Voice | Output | Quality |
|---|---|---|---|---|---|
| 1 | Gemini TTS | gemini-2.5-flash-preview-tts |
Aoede | WAV (24kHz) | Best, free tier |
| 2 | OpenAI TTS | gpt-4o-mini-tts |
nova | MP3 | High, paid |
| 3 | ElevenLabs | eleven_multilingual_v2 |
Rachel | MP3 | Excellent, paid |
Note: Gemini outputs
.wav(PCM wrapped with WAV header), not.mp3. The script returns the actual file path โ callers should use the echoed path, not assume.mp3.
Usage
# Generate audio from text (auto-deletes after delivery)
~/.openclaw/skills/audio-summary/bin/generate_audio "Your summary text here" --cleanup
# With custom output path
~/.openclaw/skills/audio-summary/bin/generate_audio "Text" --output /tmp/custom.mp3
# Use specific provider
~/.openclaw/skills/audio-summary/bin/generate_audio "Text" --provider openai --cleanup
Cleanup After Delivery
IMPORTANT: Audio files are transient. Delete them after delivery.
Pattern:
# Generate, deliver, delete
AUDIO_FILE=$(~/.openclaw/skills/audio-summary/bin/generate_audio "text" --provider openai)
# ... upload or serve the file ...
rm -f "$AUDIO_FILE"
Why: Audio files can build up quickly. A 30-second clip is ~400KB. Clean up after every use.
Incoming audio cleanup: Voice messages in ~/.openclaw/media/inbound/ should also be cleaned periodically:
# Clean up audio older than 1 day
find ~/.openclaw/media/inbound -name "*.ogg" -mtime +1 -delete
Script Output
Returns the path to the generated audio file (.wav for Gemini, .mp3 for OpenAI/ElevenLabs):
/tmp/audio-summary-1738123456.wav
Delivery
After generating audio, deliver it to your platform. Example using OpenClaw's message tool:
# Upload to chat platform
message --action send --channel <YOUR_CHANNEL> --target <TARGET_ID> \
--media /tmp/audio-summary-1738123456.mp3 --caption "Audio summary"
Summary Style Guidelines
- Conversational โ Not robotic, like talking to a friend
- Brief โ 150 words max (30-60 seconds)
- Friendly โ Warm tone
- Actionable โ Highlight what was done and what's next
Example:
"Hey! Just wrapped up the audio transcription skill. It uses Gemini to convert voice messages to text. Tested it on a few sample recordings โ works great. The multi-provider fallback is also wired up, so if Gemini is down, it'll automatically try OpenAI and then ElevenLabs. Everything's ready to go!"
Environment Variables
| Variable | Required For | Notes |
|---|---|---|
GEMINI_API_KEY |
Gemini TTS | Free tier available |
OPENAI_API_KEY |
OpenAI TTS | Paid, high quality |
ELEVENLABS_API_KEY |
ElevenLabs | Best quality, paid |
Files
bin/generate_audioโ Main TTS script with multi-provider supportSKILL.mdโ This documentation
Troubleshooting
"No TTS provider available"
- Check that at least one API key is set
- Verify keys are set in your environment or
~/.openclaw/secrets/directory
"Audio file too large"
- Keep summaries under 150 words
- Use
--provider openaifor MP3 (smaller than WAV)
"Gemini TTS failed"
- Falls back to OpenAI automatically
- Check Gemini API status at https://status.cloud.google.com
- Requires
GEMINI_API_KEYwith Gemini 2.5 Flash access