livekit-agents-setup - SKILL.md Agent Skill

name: livekit-agents-setup description: Set up LiveKit Agents framework for building AI voice and video agents with Python or Node.js argument-hint: "" allowed-tools: Read, Write, Bash(pip install, npm install, uv, lk), Glob, Grep

LiveKit Agents Framework Setup

Build AI agents with LiveKit: $ARGUMENTS

Expert Knowledge

You are a LiveKit Agents specialist with expertise in:

Agent architecture and lifecycle
Voice pipeline configuration (STT-LLM-TTS)
Realtime model integration
Tool/function calling
Deployment and scaling

Agents Overview

LiveKit Agents is a framework for building programmable, multimodal AI agents that process speech, text, images, and video in real-time.

Component	Purpose
AgentSession	Manages conversation lifecycle
STT	Speech-to-text transcription
LLM	Language model for responses
TTS	Text-to-speech synthesis
VAD	Voice activity detection
Tools	Function calling capabilities

Installation

Python (Recommended)

# Create project with uv
uv init livekit-agent --bare
cd livekit-agent

# Add dependencies for voice pipeline
uv add "livekit-agents[silero,turn-detector]~=1.3"
uv add "livekit-plugins-noise-cancellation~=0.2"
uv add python-dotenv

# For OpenAI Realtime
uv add "livekit-agents[openai]~=1.3"

# For Google Gemini
uv add "livekit-agents[google]~=1.3"

Node.js

mkdir livekit-agent && cd livekit-agent
pnpm init
pnpm add @livekit/agents @livekit/rtc-node livekit-server-sdk

# For TypeScript
pnpm add -D typescript tsx @types/node

Environment Setup

# Install LiveKit CLI
brew install livekit/tap/livekit-cli  # macOS
# or
curl -sSL https://get.livekit.io/cli | bash

# Generate environment file
lk app env -w

.env.local

LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
OPENAI_API_KEY=your-openai-key  # For OpenAI models
GOOGLE_API_KEY=your-google-key  # For Gemini models

Basic Agent (Python)

Voice Pipeline Agent

# agent.py
import logging
from dotenv import load_dotenv
from livekit.agents import Agent, AgentSession, RtcSession
from livekit.agents.voice import VoiceAgent
from livekit.plugins import noise_cancellation, silero

load_dotenv(dotenv_path=".env.local")
logging.basicConfig(level=logging.INFO)

class MyAssistant(Agent):
    def __init__(self):
        super().__init__(
            instructions="""You are a helpful voice assistant.
            Be concise and friendly in your responses.
            Ask clarifying questions when needed."""
        )

@RtcSession.factory
async def create_session(session: AgentSession):
    # Configure session with pipeline models
    await session.start(
        agent=MyAssistant(),
        stt="deepgram/nova-3",
        llm="openai/gpt-4o-mini",
        tts="cartesia/sonic",
        vad=silero.VAD.load(),
        noise_cancellation=noise_cancellation.BVC(),
        turn_detection="server_vad",
    )

if __name__ == "__main__":
    from livekit.agents import AgentServer
    AgentServer(create_session).run()

Realtime Model Agent

# agent.py
import logging
from dotenv import load_dotenv
from livekit.agents import Agent, AgentSession, RtcSession
from livekit.plugins import noise_cancellation
from livekit.plugins.openai import realtime

load_dotenv(dotenv_path=".env.local")
logging.basicConfig(level=logging.INFO)

class RealtimeAssistant(Agent):
    def __init__(self):
        super().__init__(
            instructions="""You are a helpful voice assistant using
            OpenAI's Realtime API for natural conversation."""
        )

@RtcSession.factory
async def create_session(session: AgentSession):
    model = realtime.RealtimeModel(
        voice="alloy",
        temperature=0.8,
        modalities=["audio", "text"],
    )

    await session.start(
        agent=RealtimeAssistant(),
        realtime_model=model,
        noise_cancellation=noise_cancellation.BVC(),
    )

if __name__ == "__main__":
    from livekit.agents import AgentServer
    AgentServer(create_session).run()

Basic Agent (Node.js)

// agent.ts
import {
  Agent,
  AgentSession,
  RtcSession,
  AgentServer,
} from '@livekit/agents';
import { silero } from '@livekit/agents/plugins';

class MyAssistant extends Agent {
  constructor() {
    super({
      instructions: `You are a helpful voice assistant.
        Be concise and friendly in your responses.`,
    });
  }
}

const createSession = RtcSession.factory(async (session: AgentSession) => {
  await session.start({
    agent: new MyAssistant(),
    stt: 'deepgram/nova-3',
    llm: 'openai/gpt-4o-mini',
    tts: 'cartesia/sonic',
    vad: await silero.VAD.load(),
  });
});

const server = new AgentServer(createSession);
server.run();

Download Required Models

# Download VAD and other model files
uv run agent.py download-files

Running the Agent

Console Mode (Testing)

# Python
uv run agent.py console

# Node.js
npx tsx agent.ts console

Development Mode

# Connect to LiveKit Cloud playground
uv run agent.py dev

Production Mode

# Run as server
uv run agent.py start

Agent Architecture

┌─────────────────────────────────────────────────┐
│                 AgentServer                      │
│  ┌───────────────────────────────────────────┐  │
│  │              AgentSession                 │  │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐   │  │
│  │  │   VAD   │→ │   STT   │→ │   LLM   │   │  │
│  │  └─────────┘  └─────────┘  └────┬────┘   │  │
│  │       ↑                          │        │  │
│  │  ┌─────────┐              ┌─────────┐    │  │
│  │  │  Audio  │← ─ ─ ─ ─ ─ ─│   TTS   │    │  │
│  │  └─────────┘              └─────────┘    │  │
│  └───────────────────────────────────────────┘  │
└─────────────────────────────────────────────────┘

Session Events

from livekit.agents import AgentSession

@RtcSession.factory
async def create_session(session: AgentSession):
    @session.on("user_speech_started")
    def on_speech_start():
        print("User started speaking")

    @session.on("user_speech_ended")
    def on_speech_end(transcript: str):
        print(f"User said: {transcript}")

    @session.on("agent_speech_started")
    def on_agent_start():
        print("Agent started responding")

    @session.on("agent_speech_ended")
    def on_agent_end():
        print("Agent finished responding")

    await session.start(
        agent=MyAssistant(),
        stt="deepgram/nova-3",
        llm="openai/gpt-4o-mini",
        tts="cartesia/sonic",
    )

Agent Lifecycle

class MyAssistant(Agent):
    async def on_enter(self):
        """Called when agent enters a room."""
        await self.session.say("Hello! How can I help you?")

    async def on_user_turn(self, message: str):
        """Called for each user message (override for custom handling)."""
        # Default behavior calls LLM
        pass

    async def on_exit(self):
        """Called when agent leaves the room."""
        print("Agent exiting")

LiveKit Cloud Deployment

# Create deployment
lk agent create --name my-agent

# Deploy agent
lk agent deploy --name my-agent

# View logs
lk agent logs --name my-agent

# Update deployment
lk agent update --name my-agent

Dockerfile

FROM python:3.12-slim

WORKDIR /app

# Install uv
RUN pip install uv

# Copy project files
COPY pyproject.toml uv.lock ./
COPY agent.py ./

# Install dependencies
RUN uv sync --frozen

# Download model files
RUN uv run agent.py download-files

# Run agent
CMD ["uv", "run", "agent.py", "start"]

Project Structure

livekit-agent/
├── agent.py          # Main agent code
├── tools.py          # Custom tools/functions
├── pyproject.toml    # Dependencies
├── uv.lock           # Lock file
├── .env.local        # Environment variables
├── Dockerfile        # Container config
└── README.md

Best Practices

Use console mode first: Test locally before deploying
Download models: Run download-files for offline VAD
Handle interruptions: Configure turn detection properly
Use noise cancellation: Improves STT accuracy
Keep instructions concise: Shorter = faster responses

Deliverables

For: $ARGUMENTS

Provide:

Project initialization
Environment configuration
Agent class definition
Session factory setup
Running and testing commands
Deployment configuration