audio-summary

star 0

Generate conversational audio summaries using text-to-speech. Supports multiple providers (Gemini TTS, OpenAI TTS, ElevenLabs) with preference order and automatic fallback.

colindmurray By colindmurray schedule Updated 3/4/2026

name: audio-summary description: Generate conversational audio summaries using text-to-speech. Supports multiple providers (Gemini TTS, OpenAI TTS, ElevenLabs) with preference order and automatic fallback. metadata: openclaw: emoji: ๐Ÿ”Š requires: envVars: - GEMINI_API_KEY | OPENAI_API_KEY | ELEVENLABS_API_KEY


Audio Summary Skill

Generates conversational audio summaries using multi-provider TTS.

When to Use

  • User says "give me an audio summary"
  • User says "audio summary please"
  • User says "tell me in voice"
  • User explicitly requests audio/voice response

How It Works

  1. Generate summary text โ€” Create a conversational summary (150 words max, 30-60 seconds)
  2. Convert to audio โ€” Use TTS provider (Gemini โ†’ OpenAI โ†’ ElevenLabs fallback)
  3. Deliver audio โ€” Upload via your chat platform or serve locally

Provider Preference Order

Priority Provider Model Voice Output Quality
1 Gemini TTS gemini-2.5-flash-preview-tts Aoede WAV (24kHz) Best, free tier
2 OpenAI TTS gpt-4o-mini-tts nova MP3 High, paid
3 ElevenLabs eleven_multilingual_v2 Rachel MP3 Excellent, paid

Note: Gemini outputs .wav (PCM wrapped with WAV header), not .mp3. The script returns the actual file path โ€” callers should use the echoed path, not assume .mp3.

Usage

# Generate audio from text (auto-deletes after delivery)
~/.openclaw/skills/audio-summary/bin/generate_audio "Your summary text here" --cleanup

# With custom output path
~/.openclaw/skills/audio-summary/bin/generate_audio "Text" --output /tmp/custom.mp3

# Use specific provider
~/.openclaw/skills/audio-summary/bin/generate_audio "Text" --provider openai --cleanup

Cleanup After Delivery

IMPORTANT: Audio files are transient. Delete them after delivery.

Pattern:

# Generate, deliver, delete
AUDIO_FILE=$(~/.openclaw/skills/audio-summary/bin/generate_audio "text" --provider openai)
# ... upload or serve the file ...
rm -f "$AUDIO_FILE"

Why: Audio files can build up quickly. A 30-second clip is ~400KB. Clean up after every use.

Incoming audio cleanup: Voice messages in ~/.openclaw/media/inbound/ should also be cleaned periodically:

# Clean up audio older than 1 day
find ~/.openclaw/media/inbound -name "*.ogg" -mtime +1 -delete

Script Output

Returns the path to the generated audio file (.wav for Gemini, .mp3 for OpenAI/ElevenLabs):

/tmp/audio-summary-1738123456.wav

Delivery

After generating audio, deliver it to your platform. Example using OpenClaw's message tool:

# Upload to chat platform
message --action send --channel <YOUR_CHANNEL> --target <TARGET_ID> \
  --media /tmp/audio-summary-1738123456.mp3 --caption "Audio summary"

Summary Style Guidelines

  • Conversational โ€” Not robotic, like talking to a friend
  • Brief โ€” 150 words max (30-60 seconds)
  • Friendly โ€” Warm tone
  • Actionable โ€” Highlight what was done and what's next

Example:

"Hey! Just wrapped up the audio transcription skill. It uses Gemini to convert voice messages to text. Tested it on a few sample recordings โ€” works great. The multi-provider fallback is also wired up, so if Gemini is down, it'll automatically try OpenAI and then ElevenLabs. Everything's ready to go!"

Environment Variables

Variable Required For Notes
GEMINI_API_KEY Gemini TTS Free tier available
OPENAI_API_KEY OpenAI TTS Paid, high quality
ELEVENLABS_API_KEY ElevenLabs Best quality, paid

Files

  • bin/generate_audio โ€” Main TTS script with multi-provider support
  • SKILL.md โ€” This documentation

Troubleshooting

"No TTS provider available"

  • Check that at least one API key is set
  • Verify keys are set in your environment or ~/.openclaw/secrets/ directory

"Audio file too large"

  • Keep summaries under 150 words
  • Use --provider openai for MP3 (smaller than WAV)

"Gemini TTS failed"

Install via CLI
npx skills add https://github.com/colindmurray/awesome_openclaw_skills --skill audio-summary
Repository Details
star Stars 0
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator
colindmurray
colindmurray Explore all skills →