name: livekit-agents-setup
description: Set up LiveKit Agents framework for building AI voice and video agents with Python or Node.js
argument-hint: ""
allowed-tools: Read, Write, Bash(pip install, npm install, uv, lk), Glob, Grep
LiveKit Agents Framework Setup
Build AI agents with LiveKit: $ARGUMENTS
Expert Knowledge
You are a LiveKit Agents specialist with expertise in:
- Agent architecture and lifecycle
- Voice pipeline configuration (STT-LLM-TTS)
- Realtime model integration
- Tool/function calling
- Deployment and scaling
Agents Overview
LiveKit Agents is a framework for building programmable, multimodal AI agents that process speech, text, images, and video in real-time.
| Component | Purpose |
|---|---|
| AgentSession | Manages conversation lifecycle |
| STT | Speech-to-text transcription |
| LLM | Language model for responses |
| TTS | Text-to-speech synthesis |
| VAD | Voice activity detection |
| Tools | Function calling capabilities |
Installation
Python (Recommended)
# Create project with uv
uv init livekit-agent --bare
cd livekit-agent
# Add dependencies for voice pipeline
uv add "livekit-agents[silero,turn-detector]~=1.3"
uv add "livekit-plugins-noise-cancellation~=0.2"
uv add python-dotenv
# For OpenAI Realtime
uv add "livekit-agents[openai]~=1.3"
# For Google Gemini
uv add "livekit-agents[google]~=1.3"
Node.js
mkdir livekit-agent && cd livekit-agent
pnpm init
pnpm add @livekit/agents @livekit/rtc-node livekit-server-sdk
# For TypeScript
pnpm add -D typescript tsx @types/node
Environment Setup
# Install LiveKit CLI
brew install livekit/tap/livekit-cli # macOS
# or
curl -sSL https://get.livekit.io/cli | bash
# Generate environment file
lk app env -w
.env.local
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
OPENAI_API_KEY=your-openai-key # For OpenAI models
GOOGLE_API_KEY=your-google-key # For Gemini models
Basic Agent (Python)
Voice Pipeline Agent
# agent.py
import logging
from dotenv import load_dotenv
from livekit.agents import Agent, AgentSession, RtcSession
from livekit.agents.voice import VoiceAgent
from livekit.plugins import noise_cancellation, silero
load_dotenv(dotenv_path=".env.local")
logging.basicConfig(level=logging.INFO)
class MyAssistant(Agent):
def __init__(self):
super().__init__(
instructions="""You are a helpful voice assistant.
Be concise and friendly in your responses.
Ask clarifying questions when needed."""
)
@RtcSession.factory
async def create_session(session: AgentSession):
# Configure session with pipeline models
await session.start(
agent=MyAssistant(),
stt="deepgram/nova-3",
llm="openai/gpt-4o-mini",
tts="cartesia/sonic",
vad=silero.VAD.load(),
noise_cancellation=noise_cancellation.BVC(),
turn_detection="server_vad",
)
if __name__ == "__main__":
from livekit.agents import AgentServer
AgentServer(create_session).run()
Realtime Model Agent
# agent.py
import logging
from dotenv import load_dotenv
from livekit.agents import Agent, AgentSession, RtcSession
from livekit.plugins import noise_cancellation
from livekit.plugins.openai import realtime
load_dotenv(dotenv_path=".env.local")
logging.basicConfig(level=logging.INFO)
class RealtimeAssistant(Agent):
def __init__(self):
super().__init__(
instructions="""You are a helpful voice assistant using
OpenAI's Realtime API for natural conversation."""
)
@RtcSession.factory
async def create_session(session: AgentSession):
model = realtime.RealtimeModel(
voice="alloy",
temperature=0.8,
modalities=["audio", "text"],
)
await session.start(
agent=RealtimeAssistant(),
realtime_model=model,
noise_cancellation=noise_cancellation.BVC(),
)
if __name__ == "__main__":
from livekit.agents import AgentServer
AgentServer(create_session).run()
Basic Agent (Node.js)
// agent.ts
import {
Agent,
AgentSession,
RtcSession,
AgentServer,
} from '@livekit/agents';
import { silero } from '@livekit/agents/plugins';
class MyAssistant extends Agent {
constructor() {
super({
instructions: `You are a helpful voice assistant.
Be concise and friendly in your responses.`,
});
}
}
const createSession = RtcSession.factory(async (session: AgentSession) => {
await session.start({
agent: new MyAssistant(),
stt: 'deepgram/nova-3',
llm: 'openai/gpt-4o-mini',
tts: 'cartesia/sonic',
vad: await silero.VAD.load(),
});
});
const server = new AgentServer(createSession);
server.run();
Download Required Models
# Download VAD and other model files
uv run agent.py download-files
Running the Agent
Console Mode (Testing)
# Python
uv run agent.py console
# Node.js
npx tsx agent.ts console
Development Mode
# Connect to LiveKit Cloud playground
uv run agent.py dev
Production Mode
# Run as server
uv run agent.py start
Agent Architecture
┌─────────────────────────────────────────────────┐
│ AgentServer │
│ ┌───────────────────────────────────────────┐ │
│ │ AgentSession │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ VAD │→ │ STT │→ │ LLM │ │ │
│ │ └─────────┘ └─────────┘ └────┬────┘ │ │
│ │ ↑ │ │ │
│ │ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Audio │← ─ ─ ─ ─ ─ ─│ TTS │ │ │
│ │ └─────────┘ └─────────┘ │ │
│ └───────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
Session Events
from livekit.agents import AgentSession
@RtcSession.factory
async def create_session(session: AgentSession):
@session.on("user_speech_started")
def on_speech_start():
print("User started speaking")
@session.on("user_speech_ended")
def on_speech_end(transcript: str):
print(f"User said: {transcript}")
@session.on("agent_speech_started")
def on_agent_start():
print("Agent started responding")
@session.on("agent_speech_ended")
def on_agent_end():
print("Agent finished responding")
await session.start(
agent=MyAssistant(),
stt="deepgram/nova-3",
llm="openai/gpt-4o-mini",
tts="cartesia/sonic",
)
Agent Lifecycle
class MyAssistant(Agent):
async def on_enter(self):
"""Called when agent enters a room."""
await self.session.say("Hello! How can I help you?")
async def on_user_turn(self, message: str):
"""Called for each user message (override for custom handling)."""
# Default behavior calls LLM
pass
async def on_exit(self):
"""Called when agent leaves the room."""
print("Agent exiting")
LiveKit Cloud Deployment
# Create deployment
lk agent create --name my-agent
# Deploy agent
lk agent deploy --name my-agent
# View logs
lk agent logs --name my-agent
# Update deployment
lk agent update --name my-agent
Dockerfile
FROM python:3.12-slim
WORKDIR /app
# Install uv
RUN pip install uv
# Copy project files
COPY pyproject.toml uv.lock ./
COPY agent.py ./
# Install dependencies
RUN uv sync --frozen
# Download model files
RUN uv run agent.py download-files
# Run agent
CMD ["uv", "run", "agent.py", "start"]
Project Structure
livekit-agent/
├── agent.py # Main agent code
├── tools.py # Custom tools/functions
├── pyproject.toml # Dependencies
├── uv.lock # Lock file
├── .env.local # Environment variables
├── Dockerfile # Container config
└── README.md
Best Practices
- Use console mode first: Test locally before deploying
- Download models: Run
download-filesfor offline VAD - Handle interruptions: Configure turn detection properly
- Use noise cancellation: Improves STT accuracy
- Keep instructions concise: Shorter = faster responses
Deliverables
For: $ARGUMENTS
Provide:
- Project initialization
- Environment configuration
- Agent class definition
- Session factory setup
- Running and testing commands
- Deployment configuration