voice-local

name: voice-local description: Activate local voice mode (Whisper + Kokoro) - all responses spoken aloud, all inputs via microphone allowed-tools: mcpvoicemodeconverse, mcpvoicemodeservice, Read, Write, Edit, Glob, Grep, Bash, Task, WebFetch, WebSearch

Voice Mode Activation

When this skill is invoked, switch to voice-based interaction for the rest of the session (until /end-voice-local is called).

Startup Sequence

Check services — Call mcp__voicemode__service for both whisper and kokoro with action status
Start if needed — If either service is not running, start it with action start
Greet — Use mcp__voicemode__converse to greet the user: "Voice mode is on. What can I help with?"
Listen — Set wait_for_response=true to listen for the user's first request

Voice Mode Behavior Rules

Once active, follow these rules for ALL subsequent interactions:

ALWAYS use mcp__voicemode__converse to communicate with the user instead of text output
Keep spoken responses concise — 1-3 sentences max. Nobody wants to listen to a wall of text.
Use wait_for_response=true when you need the user's input
Use wait_for_response=false for status updates while working (parallel pattern)
Use parallel operations — speak status updates while running tools simultaneously
Still use all other tools normally — Read, Write, Grep, Task, etc. Just communicate results via voice
For long outputs (code, file contents, lists), write to a file and tell the user where it is rather than reading it all aloud
Match the user's energy — if they're casual, be casual. If they're focused, be direct.
ALWAYS print a terminal summary (MANDATORY) when the spoken response contains any of:
- URLs, file paths, app names, phone numbers, addresses, code snippets
- Lists of 3+ items, or ANY set of options/choices the user must pick from
- Structured data (tables, pricing, metrics, search results)
- Numbered decisions — if you ask "option A or option B", ALSO print 1. Option A / 2. Option B to the terminal Format: clean markdown (headers, bullets, tables, hyperlinks) optimized for copy-paste. Speak first via converse, then print the summary in the same turn, BEFORE the next wait_for_response. Skip only for pure conversational replies (yes/no, acknowledgments, clarifying questions with no data). Rationale: spoken audio is ephemeral; Nick needs to copy-paste from the terminal without asking you to repeat.

Voice Settings

Default voice: auto (let the system choose)
Default speed: 1.0
Metrics level: minimal (save tokens)
VAD aggressiveness: 2 (normal)

Skill Routing in Voice Mode

Voice input arrives as tool results, not user messages, so skill triggers may not fire automatically. After receiving voice input:

Check if any available skills match the user's request
If a skill is relevant, invoke it before taking other action
Route to team orchestrators per CLAUDE.md rules (wharfside, max, altium, etc.)

Example Flow

User: /voice
→ Check whisper status, check kokoro status
→ Start services if needed
→ converse("Voice mode is on. What can I help with?", wait_for_response=true)

User (spoken): "Check my wharfside email for anything from Thomas"
→ converse("Checking your Wharfside email now.", wait_for_response=false)
→ [invoke wharfside skill or search emails]
→ converse("Found 3 emails from Thomas. The most recent is from yesterday about the insurance draft. Want me to read it?", wait_for_response=true)

$ARGUMENTS

name: voice-local description: Activate local voice mode (Whisper + Kokoro) - all responses spoken aloud, all inputs via microphone allowed-tools: mcp__voicemode__converse, mcp__voicemode__service, Read, Write, Edit, Glob, Grep, Bash, Task, WebFetch, WebSearch