voice-mastery-sonic-identity - SKILL.md Agent Skill

name: Voice Mastery (Sonic Identity) description: "The Voice God Protocol". Mastering Speech-to-Speech, Empathic AI, and Video-to-Audio to bring characters to life.

Voice Mastery: The Sonic Identity Engine

Core Philosophy

We do not use "Text-to-Speech" (Robotic). We use "Speech-to-Speech" (Sentient). This system is the blueprint for Project Genesis: The eventual ability for App Owners to build their own brand characters.

The Triple-Layer Protocol

1. The "Acting" Layer (Cinematic Assets)

For fixed, high-value assets (Intro Video, Welcome Message):

Tool: Respeecher or ElevenLabs STS.
Workflow:
1. Reference: Record the line yourself (or hire a voice actor) with the exact intended emotion (whisper, excitement, authority).
2. Conversion: Use the AI to "skin" this reference with the Angelina Model.
3. Result: 100% emotional fidelity, 0% robotic flattening.

2. The "Empathic" Layer (Real-Time Interaction)

For dynamic chat and user interaction:

Tool: Hume AI (EVI).
Workflow:
1. Listen: Analyze User Audio for prosody (tone/speed/pitch).
2. Respond: Generate audio that matches or balances the user's state.
3. Latency: Optimize for <500ms response time to feel "alive".

3. The "Soundscape" Layer (Video Immersion)

For video generative assets (Whisk output):

Tool: Google DeepMind V2A (Video-to-Audio).
Workflow:
1. Input: Feed the "Whisk" generated video (e.g., Sizzling Steak, Laughing Angelina).
2. Generate: Extract pixel-perfect synchronized audio (sizzle, clinking, breath).
3. Mix: Blend with the Voice track for a cinema-grade result.

Future: The Character Builder

Everything done for Angelina must be:

Reproducible: Documented seeds/settings.
Templated: Ready to be applied to "John" and future user-created characters.