livekit-voice - SKILL.md Agent Skill

name: livekit-voice description: > LiveKit voice agent setup — room management, token generation, voice pipelines with STT/TTS, and real-time audio streaming. version: 0.1.0 author: Jero (LATTICE / MARPA Design Studios) triggers: - "create a voice agent" - "livekit room" - "livekit token" - "voice pipeline" - "real-time voice" - "speech to text agent" tools: - Bash - Read - Write - Edit

LiveKit Voice

USE WHEN the user wants to build real-time voice agents with LiveKit — creating rooms, generating access tokens, setting up STT/TTS voice pipelines, or building conversational AI agents with audio streaming.

What It Does

Creates LiveKit-based voice agent applications with real-time audio streaming, speech-to-text, text-to-speech, and LLM-powered conversational pipelines.

Core Concepts

Room — Virtual space where participants connect for real-time communication
Token — JWT access token granting room access with specific permissions
Agent — Server-side participant that processes audio/video
Voice Pipeline — STT → LLM → TTS chain for conversational agents
Track — Individual audio or video stream within a room

Voice Agent (Python)

from livekit.agents import (
    AutoSubscribe,
    JobContext,
    WorkerOptions,
    cli,
    llm,
)
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import deepgram, openai, silero

async def entrypoint(ctx: JobContext):
    # Wait for a participant to connect
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    participant = await ctx.wait_for_participant()

    # Create the voice pipeline
    agent = VoicePipelineAgent(
        vad=silero.VAD.load(),
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o"),
        tts=openai.TTS(),
        chat_ctx=llm.ChatContext().append(
            role="system",
            text="You are a helpful voice assistant. Be concise.",
        ),
    )

    # Start the agent
    agent.start(ctx.room, participant)
    await agent.say("Hello! How can I help you?")

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Token Generation

from livekit import api
import os

def create_token(room_name: str, participant_name: str) -> str:
    token = api.AccessToken(
        os.environ["LIVEKIT_API_KEY"],
        os.environ["LIVEKIT_API_SECRET"],
    )
    token.with_identity(participant_name)
    token.with_name(participant_name)
    token.with_grants(api.VideoGrants(
        room_join=True,
        room=room_name,
    ))
    return token.to_jwt()

Room Management

from livekit import api
import os

async def manage_rooms():
    lk = api.LiveKitAPI(
        os.environ["LIVEKIT_URL"],
        os.environ["LIVEKIT_API_KEY"],
        os.environ["LIVEKIT_API_SECRET"],
    )

    # List rooms
    rooms = await lk.room.list_rooms(api.ListRoomsRequest())

    # Create room
    room = await lk.room.create_room(api.CreateRoomRequest(
        name="my-room",
        empty_timeout=300,  # 5 min
        max_participants=10,
    ))

    # Delete room
    await lk.room.delete_room(api.DeleteRoomRequest(room="my-room"))

    await lk.aclose()

Environment Variables

LIVEKIT_URL=ws://localhost:7880
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
DEEPGRAM_API_KEY=your_deepgram_key
OPENAI_API_KEY=your_openai_key

Installation

uv pip install "livekit-agents[codecs]~=1.0"
uv pip install livekit-plugins-deepgram livekit-plugins-openai livekit-plugins-silero

CLI

# Run the agent in dev mode
python agent.py dev

# Run in production
python agent.py start

# Start local LiveKit server for development
livekit-server --dev

Patterns

Custom STT/TTS Providers

# Use Deepgram for STT, ElevenLabs for TTS
from livekit.plugins import deepgram, elevenlabs

agent = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(model="nova-2"),
    llm=openai.LLM(model="gpt-4o"),
    tts=elevenlabs.TTS(voice_id="your_voice_id"),
)

Function Calling in Voice Agents

from livekit.agents import llm

class AssistantFunctions(llm.FunctionContext):
    @llm.ai_callable(description="Get the weather for a location")
    async def get_weather(self, location: str) -> str:
        # Fetch weather data
        return f"It's sunny in {location}"

agent = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(),
    llm=openai.LLM(model="gpt-4o"),
    tts=openai.TTS(),
    fnc_ctx=AssistantFunctions(),
)

Turn Detection

agent = VoicePipelineAgent(
    vad=silero.VAD.load(
        min_speech_duration=0.1,
        min_silence_duration=0.5,
        activation_threshold=0.5,
    ),
    # ... rest of pipeline
)

Rules

Always use VAD (Voice Activity Detection) — Silero is the default
Generate tokens server-side only — never expose API secrets to clients
Use AutoSubscribe.AUDIO_ONLY for voice-only agents to save bandwidth
Set empty_timeout on rooms to auto-cleanup idle rooms
Test with livekit-server --dev before deploying to cloud
Environment variables for all API keys — never hardcode