name: elevenlabs description: "ElevenLabs audio generation — text-to-speech, voice cloning, and sound effects. Use this skill any time the agent needs to: convert text to spoken audio, narrate documents or content, generate voiceovers, clone voices from audio samples, create sound effects, or produce any audio output from text. Supports multiple voices, languages, models, voice cloning, batch processing, and sound effect generation. Requires ELEVENLABS_API_KEY."
ElevenLabs — Text-to-Speech, Voice Cloning & Sound Effects
ElevenLabs-powered audio generation. Convert any text to natural-sounding speech, clone voices, generate sound effects.
Setup
Required env var: ELEVENLABS_API_KEY
The CLI script at scripts/tts.py uses only Python stdlib (urllib, json, argparse) — no pip dependencies needed.
Quick Reference
All commands use the CLI script:
python skills/GENERAL-KNOWLEDGE-WORKER/elevenlabs/scripts/tts.py <command> [args]
Speak — Convert text to speech
# Basic — default voice (George), multilingual v2 model
python scripts/tts.py speak "Hello, this is Kortix speaking."
# Named voice
python scripts/tts.py speak "Welcome to the presentation." --voice Rachel
# Custom output file
python scripts/tts.py speak "Chapter one." --voice George -o chapter1.mp3
# From a file (prefix with @)
python scripts/tts.py speak @article.txt -o narration.mp3
# From stdin
echo "Dynamic text" | python scripts/tts.py speak -
# With voice tuning
python scripts/tts.py speak "Dramatic reading." --voice Rachel --stability 0.3 --similarity 0.9 --style 0.7
# High quality output
python scripts/tts.py speak "Studio quality." --format mp3_44100_192
# Different model (faster, English-only)
python scripts/tts.py speak "Quick response." --model eleven_turbo_v2_5
# Speed control
python scripts/tts.py speak "Slowly now." --speed 0.7
python scripts/tts.py speak "Fast paced!" --speed 1.5
Voices — List and search
# List all available voices
python scripts/tts.py voices
# Search by name, gender, accent, or use case
python scripts/tts.py voices --search "female"
python scripts/tts.py voices --search "british"
python scripts/tts.py voices --search "narration"
Models — List available TTS models
python scripts/tts.py models
Clone — Create a custom voice from audio samples
# Clone from audio files (1-25 samples, each 1-10 minutes)
python scripts/tts.py clone "ClientVoice" sample1.mp3 sample2.mp3
# With description
python scripts/tts.py clone "CEO" ceo_speech.mp3 --description "Confident male voice, American accent"
# Use the cloned voice
python scripts/tts.py speak "Hello from my cloned voice." --voice-id <returned_voice_id>
Batch — Convert entire documents
# Convert a text file to a single audio file
python scripts/tts.py batch article.txt -o article_audio/
# Split by paragraphs — one audio file per paragraph
python scripts/tts.py batch book_chapter.txt --split-paragraphs -o chapter_audio/
# With specific voice
python scripts/tts.py batch script.txt --voice Rachel --split-paragraphs
Sound Effects — Generate from text prompts
# Generate a sound effect
python scripts/tts.py sound "ocean waves crashing on a beach"
# With specific output and duration
python scripts/tts.py sound "thunderstorm with heavy rain" -o thunder.mp3 --duration 10.0
Voice Settings Guide
Fine-tune voice output with these parameters:
| Parameter | Range | Default | Effect |
|---|---|---|---|
--stability |
0.0 - 1.0 | 0.5 | Higher = more consistent, lower = more expressive/varied |
--similarity |
0.0 - 1.0 | 0.75 | Higher = closer to original voice, lower = more creative |
--style |
0.0 - 1.0 | 0.0 | Higher = more expressive style, can reduce stability |
--speed |
0.5 - 2.0 | 1.0 | Playback speed multiplier |
Recommended presets:
- Narration/Audiobook:
--stability 0.5 --similarity 0.75(balanced, natural) - News/Formal:
--stability 0.8 --similarity 0.8(consistent, clear) - Character/Dramatic:
--stability 0.3 --similarity 0.8 --style 0.7(expressive, varied) - Conversational:
--stability 0.4 --similarity 0.6(natural variation)
Output Formats
| Format | Quality | Size | Use Case |
|---|---|---|---|
mp3_44100_128 |
High (default) | Medium | General purpose, good quality |
mp3_44100_192 |
Very high | Large | Studio quality, archival |
mp3_22050_32 |
Low | Small | Voice messages, previews |
pcm_44100 |
Lossless | Very large | Post-processing, editing |
pcm_16000 |
Lossless low | Large | Speech recognition input |
opus_48000_128 |
High | Small | Web streaming, efficient |
Models
| Model | Speed | Quality | Languages | Best For |
|---|---|---|---|---|
eleven_multilingual_v2 |
Normal | Highest | 29 languages | Default — best quality, multilingual |
eleven_turbo_v2_5 |
Fast | High | 32 languages | Low-latency, near-instant generation |
eleven_monolingual_v1 |
Normal | Good | English only | Legacy English-only workloads |
Always use eleven_multilingual_v2 unless speed is critical (then use eleven_turbo_v2_5).
Common Workflows
Narrate a document
# Read the document, generate speech
python scripts/tts.py speak @workspace/report.md --voice Rachel -o report_narration.mp3
Create a podcast intro
python scripts/tts.py speak "Welcome to the Kortix Weekly. I'm your host, and today we're diving into autonomous AI agents." \
--voice George --stability 0.4 --similarity 0.8 --style 0.5 \
-o podcast_intro.mp3
Narrate a presentation (per-slide)
For each slide, generate a separate audio file:
python scripts/tts.py speak "Slide 1: Introduction to our company" --voice Rachel -o slides/01.mp3
python scripts/tts.py speak "Slide 2: Our key metrics this quarter" --voice Rachel -o slides/02.mp3
Or write all narration to a text file (one paragraph per slide) and batch it:
python scripts/tts.py batch slide_notes.txt --split-paragraphs --voice Rachel -o slide_audio/
Voice clone for personalization
# Clone the user's voice from samples they provide
python scripts/tts.py clone "UserVoice" sample1.mp3 sample2.mp3 sample3.mp3 \
--description "The user's natural speaking voice"
# Use it for all future TTS
python scripts/tts.py speak "Personalized message." --voice-id <voice_id> -o message.mp3
Generate ambient audio
python scripts/tts.py sound "coffee shop ambiance with gentle chatter" -o ambient.mp3 --duration 15
python scripts/tts.py sound "gentle rain on a window" -o rain.mp3 --duration 30
Integration Notes
- No pip dependencies. The script uses only Python stdlib (
urllib.request,json,argparse). Works on any Python 3.10+ installation. - Output files are saved relative to the current working directory. Use
-oto specify exact paths. - Long text is handled automatically by the API. For very long documents (>5000 chars), consider using
batchwith--split-paragraphsfor better quality and to avoid timeouts. - Rate limits apply per your ElevenLabs plan. The script will return API errors if limits are hit.
- Character usage counts against your ElevenLabs monthly quota. Check your plan's limits.
Env Vars
| Variable | Required | Description |
|---|---|---|
ELEVENLABS_API_KEY |
Yes | Your ElevenLabs API key (also accepts ELEVEN_API_KEY) |
Add to core/docker/.env (or the active OpenCode config dir .env during local materialization):
ELEVENLABS_API_KEY=your_key_here