name: livekit-voice description: > LiveKit voice agent setup — room management, token generation, voice pipelines with STT/TTS, and real-time audio streaming. version: 0.1.0 author: Jero (LATTICE / MARPA Design Studios) triggers: - "create a voice agent" - "livekit room" - "livekit token" - "voice pipeline" - "real-time voice" - "speech to text agent" tools: - Bash - Read - Write - Edit
LiveKit Voice
USE WHEN the user wants to build real-time voice agents with LiveKit — creating rooms, generating access tokens, setting up STT/TTS voice pipelines, or building conversational AI agents with audio streaming.
What It Does
Creates LiveKit-based voice agent applications with real-time audio streaming, speech-to-text, text-to-speech, and LLM-powered conversational pipelines.
Core Concepts
- Room — Virtual space where participants connect for real-time communication
- Token — JWT access token granting room access with specific permissions
- Agent — Server-side participant that processes audio/video
- Voice Pipeline — STT → LLM → TTS chain for conversational agents
- Track — Individual audio or video stream within a room
Voice Agent (Python)
from livekit.agents import (
AutoSubscribe,
JobContext,
WorkerOptions,
cli,
llm,
)
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import deepgram, openai, silero
async def entrypoint(ctx: JobContext):
# Wait for a participant to connect
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
participant = await ctx.wait_for_participant()
# Create the voice pipeline
agent = VoicePipelineAgent(
vad=silero.VAD.load(),
stt=deepgram.STT(),
llm=openai.LLM(model="gpt-4o"),
tts=openai.TTS(),
chat_ctx=llm.ChatContext().append(
role="system",
text="You are a helpful voice assistant. Be concise.",
),
)
# Start the agent
agent.start(ctx.room, participant)
await agent.say("Hello! How can I help you?")
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
Token Generation
from livekit import api
import os
def create_token(room_name: str, participant_name: str) -> str:
token = api.AccessToken(
os.environ["LIVEKIT_API_KEY"],
os.environ["LIVEKIT_API_SECRET"],
)
token.with_identity(participant_name)
token.with_name(participant_name)
token.with_grants(api.VideoGrants(
room_join=True,
room=room_name,
))
return token.to_jwt()
Room Management
from livekit import api
import os
async def manage_rooms():
lk = api.LiveKitAPI(
os.environ["LIVEKIT_URL"],
os.environ["LIVEKIT_API_KEY"],
os.environ["LIVEKIT_API_SECRET"],
)
# List rooms
rooms = await lk.room.list_rooms(api.ListRoomsRequest())
# Create room
room = await lk.room.create_room(api.CreateRoomRequest(
name="my-room",
empty_timeout=300, # 5 min
max_participants=10,
))
# Delete room
await lk.room.delete_room(api.DeleteRoomRequest(room="my-room"))
await lk.aclose()
Environment Variables
LIVEKIT_URL=ws://localhost:7880
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
DEEPGRAM_API_KEY=your_deepgram_key
OPENAI_API_KEY=your_openai_key
Installation
uv pip install "livekit-agents[codecs]~=1.0"
uv pip install livekit-plugins-deepgram livekit-plugins-openai livekit-plugins-silero
CLI
# Run the agent in dev mode
python agent.py dev
# Run in production
python agent.py start
# Start local LiveKit server for development
livekit-server --dev
Patterns
Custom STT/TTS Providers
# Use Deepgram for STT, ElevenLabs for TTS
from livekit.plugins import deepgram, elevenlabs
agent = VoicePipelineAgent(
vad=silero.VAD.load(),
stt=deepgram.STT(model="nova-2"),
llm=openai.LLM(model="gpt-4o"),
tts=elevenlabs.TTS(voice_id="your_voice_id"),
)
Function Calling in Voice Agents
from livekit.agents import llm
class AssistantFunctions(llm.FunctionContext):
@llm.ai_callable(description="Get the weather for a location")
async def get_weather(self, location: str) -> str:
# Fetch weather data
return f"It's sunny in {location}"
agent = VoicePipelineAgent(
vad=silero.VAD.load(),
stt=deepgram.STT(),
llm=openai.LLM(model="gpt-4o"),
tts=openai.TTS(),
fnc_ctx=AssistantFunctions(),
)
Turn Detection
agent = VoicePipelineAgent(
vad=silero.VAD.load(
min_speech_duration=0.1,
min_silence_duration=0.5,
activation_threshold=0.5,
),
# ... rest of pipeline
)
Rules
- Always use VAD (Voice Activity Detection) — Silero is the default
- Generate tokens server-side only — never expose API secrets to clients
- Use
AutoSubscribe.AUDIO_ONLYfor voice-only agents to save bandwidth - Set
empty_timeouton rooms to auto-cleanup idle rooms - Test with
livekit-server --devbefore deploying to cloud - Environment variables for all API keys — never hardcode