name: deepgram-python-voice-agent
description: Use when writing or reviewing Python code in this repo that builds an interactive voice agent via agent.deepgram.com/v1/agent/converse. Covers client.agent.v1.connect(), AgentV1Settings, send_settings, send_media, event handling, and function/tool calling. Full-duplex STT + LLM + TTS with barge-in. Use deepgram-python-text-to-speech for one-way synthesis, deepgram-python-speech-to-text / deepgram-python-conversational-stt for transcription only. Triggers include "voice agent", "agent converse", "full duplex", "interactive assistant", "barge-in", "agent.v1", "function calling", "AgentV1Settings".
Using Deepgram Voice Agent (Python SDK)
Full-duplex voice agent runtime: STT + LLM (think) + TTS + function calling over a single WebSocket at agent.deepgram.com/v1/agent/converse.
When to use this product
- You want an interactive voice assistant: user speaks, agent thinks, agent speaks, interruptions allowed.
- You want function / tool calling triggered by the conversation.
- You want Deepgram to host the orchestration (vs wiring STT + LLM + TTS yourself).
Use a different skill when:
- One-way transcription →
deepgram-python-speech-to-textordeepgram-python-conversational-stt. - One-way synthesis →
deepgram-python-text-to-speech. - Analytics on finished audio →
deepgram-python-audio-intelligence. - Managing reusable agent configs (persisted on the server) →
deepgram-python-management-api.
Authentication
from dotenv import load_dotenv
load_dotenv()
from deepgram import DeepgramClient
client = DeepgramClient()
Header: Authorization: Token <api_key>. Base URL: wss://agent.deepgram.com/v1/agent/converse.
Quick start
import threading, time
from deepgram.core.events import EventType
from deepgram.agent.v1.types import (
AgentV1Settings,
AgentV1SettingsAgent,
AgentV1SettingsAgentListen,
AgentV1SettingsAgentListenProvider_V1,
AgentV1SettingsAudio,
AgentV1SettingsAudioInput,
)
from deepgram.types.speak_settings_v1 import SpeakSettingsV1
from deepgram.types.speak_settings_v1provider import SpeakSettingsV1Provider_Deepgram
from deepgram.types.think_settings_v1 import ThinkSettingsV1
from deepgram.types.think_settings_v1provider import ThinkSettingsV1Provider_OpenAi
with client.agent.v1.connect() as agent:
settings = AgentV1Settings(
audio=AgentV1SettingsAudio(
input=AgentV1SettingsAudioInput(encoding="linear16", sample_rate=24000),
),
agent=AgentV1SettingsAgent(
listen=AgentV1SettingsAgentListen(
provider=AgentV1SettingsAgentListenProvider_V1(type="deepgram", model="nova-3"),
),
think=ThinkSettingsV1(
provider=ThinkSettingsV1Provider_OpenAi(
type="open_ai", model="gpt-4o-mini", temperature=0.7,
),
prompt="You are a helpful assistant. Keep replies brief.",
),
speak=SpeakSettingsV1(
provider=SpeakSettingsV1Provider_Deepgram(type="deepgram", model="aura-2-asteria-en"),
),
),
)
agent.send_settings(settings) # MUST be first message after connect
def on_message(m):
if isinstance(m, bytes):
# agent speech audio — play or append to output buffer
return
t = getattr(m, "type", "Unknown")
if t == "ConversationText":
print(f"[{getattr(m, 'role', '?')}] {getattr(m, 'content', '')}")
elif t == "UserStartedSpeaking": print(">> user speaking")
elif t == "AgentThinking": print(">> agent thinking")
elif t == "AgentStartedSpeaking": print(">> agent speaking")
elif t == "AgentAudioDone": print(">> agent done")
elif t == "FunctionCallRequest": handle_tool_call(m)
agent.on(EventType.OPEN, lambda _: print("open"))
agent.on(EventType.MESSAGE, on_message)
agent.on(EventType.CLOSE, lambda _: print("close"))
agent.on(EventType.ERROR, lambda e: print(f"err: {e}"))
def send_audio():
for chunk in mic_chunks():
agent.send_media(chunk)
threading.Thread(target=send_audio, daemon=True).start()
agent.start_listening() # blocks
Event types (server → client)
Welcome— connection acknowledgedSettingsApplied— yourSettingsacceptedConversationText— text of a turn (withrole:userorassistant)UserStartedSpeaking— VAD detected userAgentThinking— LLM is workingFunctionCallRequest— tool/function call initiated by the modelAgentStartedSpeaking— TTS starting- Binary frames — audio chunks
AgentAudioDone— TTS finished for this turnWarning,Error
Client messages
- Initial
Settings(send first) Media(binary audio frames in declared encoding/sample_rate)KeepAlive(on long sessions)- Prompt / think / speak update messages (change mid-session)
- User / assistant text injection
- Function call response (reply to
FunctionCallRequest)
Reusable agent configurations
You can persist the agent block of a Settings message server-side and reuse it by agent_id. client.voice_agent.configurations.create stores a JSON string representing the agent object only (listen / think / speak providers + prompt) — NOT the full AgentV1Settings payload. Do not send top-level Settings fields like audio to that API; those still go in the live Settings message at connect time. The returned agent_id replaces the inline agent object in future Settings messages. Managed via client.voice_agent.configurations.* — see deepgram-python-management-api.
Dynamic mid-session adjustment
You can change agent behavior without disconnecting by sending control messages on the live socket. Each method is available on the agent connection object (agent in the quick-start) for both sync and async clients.
from deepgram.agent.v1.types import (
AgentV1UpdatePrompt,
AgentV1UpdateSpeak,
AgentV1UpdateSpeakSpeak, # type alias accepting SpeakSettingsV1 or list
AgentV1UpdateThink,
AgentV1UpdateThinkThink, # type alias accepting ThinkSettingsV1 or list
AgentV1InjectAgentMessage,
AgentV1InjectUserMessage,
AgentV1KeepAlive,
)
from deepgram.types.speak_settings_v1 import SpeakSettingsV1
from deepgram.types.speak_settings_v1provider import SpeakSettingsV1Provider_Deepgram
from deepgram.types.think_settings_v1 import ThinkSettingsV1
from deepgram.types.think_settings_v1provider import ThinkSettingsV1Provider_OpenAi
# 1. Swap the LLM system prompt mid-conversation (e.g. escalate to a different persona)
agent.send_update_prompt(
AgentV1UpdatePrompt(prompt="You are now in expert escalation mode. Be precise and concise.")
)
# Server replies with a `PromptUpdated` event when the new prompt is in effect.
# 2. Swap the TTS voice without reconnecting (e.g. switch language or persona)
agent.send_update_speak(
AgentV1UpdateSpeak(
speak=SpeakSettingsV1(
provider=SpeakSettingsV1Provider_Deepgram(
type="deepgram", model="aura-2-luna-en",
),
),
)
)
# Server replies with a `SpeakUpdated` event.
# 3. Swap the LLM provider/model (e.g. cheaper model for follow-ups)
agent.send_update_think(
AgentV1UpdateThink(
think=ThinkSettingsV1(
provider=ThinkSettingsV1Provider_OpenAi(
type="open_ai", model="gpt-4o-mini", temperature=0.3,
),
prompt="You are a helpful assistant. Keep replies brief.",
),
)
)
# Server replies with a `ThinkUpdated` event.
# 4. Force the agent to say something specific (without waiting for user audio)
agent.send_inject_agent_message(
AgentV1InjectAgentMessage(message="Quick reminder: your call is being recorded.")
)
# Useful for proactive prompts, status updates, or scripted segues.
# 5. Inject a user message (e.g. text input from a chat sidebar alongside voice)
agent.send_inject_user_message(
AgentV1InjectUserMessage(content="Schedule a follow-up for next Tuesday at 2pm.")
)
# Server may reply with `InjectionRefused` if the agent is mid-utterance — retry after `AgentAudioDone`.
# 6. Idle-period keep-alive (no payload required; the SDK fills in the type literal)
agent.send_keep_alive(AgentV1KeepAlive())
# Or simply: agent.send_keep_alive() — the message arg is optional.
Async client equivalents are identical but await-prefixed:
await agent.send_update_prompt(AgentV1UpdatePrompt(prompt="..."))
await agent.send_inject_agent_message(AgentV1InjectAgentMessage(message="..."))
Stream lifecycle & recovery
Continuous voice agents need explicit handling for idle periods, stream pauses, and reconnects.
Pause / idle (no audio for several seconds): stop calling send_media, but emit a KeepAlive every ~5 seconds. Without it, the server closes the socket at ~10 seconds of idle.
import threading, time
stop = threading.Event()
def keepalive_loop():
while not stop.is_set():
if stop.wait(5):
return
try:
agent.send_keep_alive()
except Exception:
return # socket closed; outer loop will reconnect
threading.Thread(target=keepalive_loop, daemon=True).start()
Resume after pause: just call send_media again. No control message is required — the agent picks up VAD on the next chunk.
Reconnect after disconnect (preserve conversation context): Settings cannot be re-sent on the same closed socket; open a new connection and resend the same Settings. To carry conversation history forward, include it in the new Settings.agent.context.messages so the LLM resumes with prior turns:
from deepgram.agent.v1.types import (
AgentV1SettingsAgentContext,
AgentV1SettingsAgentContextMessagesItem,
AgentV1SettingsAgentContextMessagesItemContent,
AgentV1SettingsAgentContextMessagesItemContentRole,
)
# Build the new Settings with the captured prior turns
context = AgentV1SettingsAgentContext(
messages=[
AgentV1SettingsAgentContextMessagesItem(
content=AgentV1SettingsAgentContextMessagesItemContent(
role=AgentV1SettingsAgentContextMessagesItemContentRole.USER,
content="Hi, I'd like to schedule a meeting.",
),
),
AgentV1SettingsAgentContextMessagesItem(
content=AgentV1SettingsAgentContextMessagesItemContent(
role=AgentV1SettingsAgentContextMessagesItemContentRole.ASSISTANT,
content="Sure — what day works best?",
),
),
],
)
new_settings = settings.model_copy(update={"agent": settings.agent.model_copy(update={"context": context})})
# Open a fresh connection and replay
with client.agent.v1.connect() as agent2:
agent2.send_settings(new_settings)
# ... same handlers + audio loop as before
The server emits a History message on connect when the SDK has captured prior turns; in Python you receive this as an AgentV1History object (wire type literal: "History"). Persist these turns in your application so a reconnect can rebuild context.messages.
Detect disconnects: the EventType.CLOSE handler fires before the with block exits. Catch it and trigger your reconnect logic from there. Check EventType.ERROR payloads for cause (network drop vs server-initiated close vs warning).
API reference (layered)
- In-repo reference:
reference.md— "Agent V1 Connect", "Voice Agent Configurations". - AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
- Context7: library ID
/llmstxt/developers_deepgram_llms_txt. - Product docs:
Gotchas
- Pick the right auth scheme for the credential type. API keys use
Authorization: Token <api_key>. Temporary / access tokens (created viaclient.auth.v1.tokens.grant()or an equivalent server) useAuthorization: Bearer <access_token>. The customDeepgramClientin this repo accepts anaccess_tokenparameter and installs a Bearer override for all HTTP + WebSocket calls — seesrc/deepgram/client.py. - Base URL is
agent.deepgram.com, notapi.deepgram.com. - Send
SettingsIMMEDIATELY after connect — no audio before settings are applied. - Listen/speak encoding + sample_rate must match both your input audio and your playback path.
- Keepalive on long idle sessions, otherwise the server closes.
- Function call responses are synchronous to the turn — reply promptly.
- Provider types are tagged unions (
ThinkSettingsV1Provider_OpenAi,SpeakSettingsV1Provider_Deepgram, ...). Pick the right union variant; don't pass raw dicts. socket_client.pyis temporarily frozen (see.fernignore→src/deepgram/agent/v1/socket_client.py) and currently carries_sanitize_numeric_typesplus theconstruct_type/ broad-catch fixes — needed for unknown WS message shapes. Expected to be unfrozen during a future Fern regen and re-compared.
Example files in this repo
examples/30-voice-agent.pytests/manual/agent/v1/connect/main.py— live connection test
Central product skills
For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:
npx skills add deepgram/skills
This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).