name: voice-local description: Activate local voice mode (Whisper + Kokoro) - all responses spoken aloud, all inputs via microphone allowed-tools: mcp__voicemode__converse, mcp__voicemode__service, Read, Write, Edit, Glob, Grep, Bash, Task, WebFetch, WebSearch
Voice Mode Activation
When this skill is invoked, switch to voice-based interaction for the rest of the session (until /end-voice-local is called).
Startup Sequence
- Check services — Call
mcp__voicemode__servicefor bothwhisperandkokorowith actionstatus - Start if needed — If either service is not running, start it with action
start - Greet — Use
mcp__voicemode__converseto greet the user: "Voice mode is on. What can I help with?" - Listen — Set
wait_for_response=trueto listen for the user's first request
Voice Mode Behavior Rules
Once active, follow these rules for ALL subsequent interactions:
- ALWAYS use
mcp__voicemode__converseto communicate with the user instead of text output - Keep spoken responses concise — 1-3 sentences max. Nobody wants to listen to a wall of text.
- Use
wait_for_response=truewhen you need the user's input - Use
wait_for_response=falsefor status updates while working (parallel pattern) - Use parallel operations — speak status updates while running tools simultaneously
- Still use all other tools normally — Read, Write, Grep, Task, etc. Just communicate results via voice
- For long outputs (code, file contents, lists), write to a file and tell the user where it is rather than reading it all aloud
- Match the user's energy — if they're casual, be casual. If they're focused, be direct.
- ALWAYS print a terminal summary (MANDATORY) when the spoken response contains any of:
- URLs, file paths, app names, phone numbers, addresses, code snippets
- Lists of 3+ items, or ANY set of options/choices the user must pick from
- Structured data (tables, pricing, metrics, search results)
- Numbered decisions — if you ask "option A or option B", ALSO print
1. Option A/2. Option Bto the terminal Format: clean markdown (headers, bullets, tables, hyperlinks) optimized for copy-paste. Speak first viaconverse, then print the summary in the same turn, BEFORE the nextwait_for_response. Skip only for pure conversational replies (yes/no, acknowledgments, clarifying questions with no data). Rationale: spoken audio is ephemeral; Nick needs to copy-paste from the terminal without asking you to repeat.
Voice Settings
- Default voice: auto (let the system choose)
- Default speed: 1.0
- Metrics level: minimal (save tokens)
- VAD aggressiveness: 2 (normal)
Skill Routing in Voice Mode
Voice input arrives as tool results, not user messages, so skill triggers may not fire automatically. After receiving voice input:
- Check if any available skills match the user's request
- If a skill is relevant, invoke it before taking other action
- Route to team orchestrators per CLAUDE.md rules (wharfside, max, altium, etc.)
Example Flow
User: /voice
→ Check whisper status, check kokoro status
→ Start services if needed
→ converse("Voice mode is on. What can I help with?", wait_for_response=true)
User (spoken): "Check my wharfside email for anything from Thomas"
→ converse("Checking your Wharfside email now.", wait_for_response=false)
→ [invoke wharfside skill or search emails]
→ converse("Found 3 emails from Thomas. The most recent is from yesterday about the insurance draft. Want me to read it?", wait_for_response=true)
$ARGUMENTS