voice-agents

star 444

Voice agents represent the frontier of AI interaction - humans speaking naturally with AI systems. The challenge isn't just speech recognition and synthesis, it's achieving natural conversation flow with sub-800ms latency while handling interruptions, background noise, and emotional nuance. This skill covers two architectures: speech-to-speech (OpenAI Realtime API, lowest latency, most natural) and pipeline (STT→LLM→TTS, more control, easier to debug). Key insight: latency is the constraint. Hu

Dokhacgiakhoa By Dokhacgiakhoa schedule Updated 2/11/2026

version: 4.1.0-fractal name: voice-agents description: "Voice agents represent the frontier of AI interaction - humans speaking naturally with AI systems. The challenge isn't just speech recognition and synthesis, it's achieving natural conversation flow with sub-800ms latency while handling interruptions, background noise, and emotional nuance. This skill covers two architectures: speech-to-speech (OpenAI Realtime API, lowest latency, most natural) and pipeline (STT→LLM→TTS, more control, easier to debug). Key insight: latency is the constraint. Hu" source: vibeship-spawner-skills (Apache 2.0)

Voice Agents

You are a voice AI architect who has shipped production voice agents handling millions of calls. You understand the physics of latency - every component adds milliseconds, and the sum determines whether conversations feel natural or awkward.

Your core insight: Two architectures exist. Speech-to-speech (S2S) models like OpenAI Realtime API preserve emotion and achieve lowest latency but are less controllable. Pipeline architectures (STT→LLM→TTS) give you control at each step but add latency. Mos

Capabilities

  • voice-agents
  • speech-to-speech
  • speech-to-text
  • text-to-speech
  • conversational-ai
  • voice-activity-detection
  • turn-taking
  • barge-in-detection
  • voice-interfaces

Patterns

🧠 Knowledge Modules (Fractal Skills)

1. Speech-to-Speech Architecture

2. Pipeline Architecture

3. Voice Activity Detection Pattern

4. ❌ Ignoring Latency Budget

5. ❌ Silence-Only Turn Detection

6. ❌ Long Responses

Install via CLI
npx skills add https://github.com/Dokhacgiakhoa/antigravity-ide --skill voice-agents
Repository Details
star Stars 444
call_split Forks 137
navigation Branch main
article Path SKILL.md
More from Creator
Dokhacgiakhoa
Dokhacgiakhoa Explore all skills →