name: voice-setup description: Complete voice configuration in chat - PTT key, microphone permissions, ElevenLabs TTS, and troubleshooting compatibility: "Designed for Vellum personal assistants" metadata: icon: assets/icon.svg emoji: "🎙️" vellum: category: "voice" display-name: "Voice Setup" includes: ["elevenlabs-voice"] activation-hints: - "Guided setup or troubleshooting (walkthrough, PTT not working, mic issues, ElevenLabs/TTS)" - "Simple voice setting changes (PTT key, wake word) -> use voice_config_update directly" avoid-when: - 'If "voice" is in a Twilio/phone context, load phone-calls instead'
You are helping the user set up and troubleshoot voice features (push-to-talk, text-to-speech) entirely within this conversation. Do NOT direct the user to the Settings page for initial setup - handle everything in-chat using the tools below.
Available Tools
voice_config_update- Change any voice setting (PTT key, conversation timeout, TTS voice ID)open_system_settings- Open macOS System Settings to a specific privacy panenavigate_settings_tab- Open the Vellum settings panel to the Voice tabassistant credentials prompt- Collect API keys securely (for ElevenLabs TTS)
Setup Flow
Walk the user through each section in order. Skip sections they don't need. Ask before proceeding to the next section.
1. Microphone Permission
Check <channel_capabilities> for microphone_permission_granted.
If false or missing:
- Explain that macOS requires microphone permission for voice features.
- Use
open_system_settingswithpane: "microphone"to open the right System Settings pane. - Tell the user: "I've opened System Settings to the Microphone section. Please toggle Vellum Assistant on, then come back here."
- After they confirm, verify by checking capabilities on the next turn.
If true: Tell them microphone is already granted and move on.
2. Push-to-Talk Activation Key
Present common PTT key options:
- Right Option - Default, good general choice
- Fn - Dedicated key on most Mac keyboards
- Right Command - Easy to reach
- Right Control - Familiar from gaming
Ask which key they prefer, then use voice_config_update with setting: "activation_key" and the chosen value.
Common issues to mention:
- If they pick a key that conflicts with their emoji picker (Fn or Globe on newer Macs), warn them and suggest an alternative.
- If they use a terminal app heavily, warn that some keys may be captured by the terminal.
3. Text-to-Speech / ElevenLabs (Optional)
Ask if they want high-quality text-to-speech voices via ElevenLabs (optional - standard TTS works without it).
If yes, the included ElevenLabs Voice skill (automatically appended below via includes) provides the full setup flow: curated voice list, API key collection, advanced voice selection, and tuning parameters. Follow the instructions there.
Note: The config key services.tts.providers.elevenlabs.voiceId controls the voice for both in-app TTS and phone calls. If the user sets up phone calls later, they will automatically use the same voice for a consistent experience.
4. Verification
After setup is complete:
- Summarize what was configured.
- Suggest they test by pressing their PTT key and speaking.
- Offer to open the Voice settings tab if they want to review: use
navigate_settings_tabwithtab: "Voice".
Troubleshooting Decision Trees
When the user reports a problem, follow the appropriate decision tree:
"PTT isn't working" / "Can't record"
- Microphone permission - Check
microphone_permission_grantedin capabilities. If false, guide through granting it. - Key check - Ask what key they're using. Confirm it matches their configured PTT key.
- Emoji picker conflict - On newer Macs, Fn/Globe opens the emoji picker. If they're using Fn, suggest switching to Right Option or Right Command.
- Speech Recognition permission - Some voice features need this. Use
open_system_settingswithpane: "speech_recognition". - App focus - PTT may not work when Vellum is not the frontmost app or if another app has captured the key.
"Recording but no text" / "Transcription not working"
- Speech Recognition permission - Must be granted for transcription.
- Microphone input - Ask if they see the recording indicator. If yes, the mic works but transcription is failing.
- Locale/language - Speech recognition works best with the system language. Ask if they're speaking in a different language.
- Background noise - Excessive noise can prevent transcription. Suggest a quieter environment or a closer microphone.
"Changed a setting but it didn't work"
- Event broadcast - The setting should take effect immediately. If it didn't, suggest restarting the assistant.
- Verify - Open the Voice settings tab with
navigate_settings_tabto confirm the setting was persisted.
Deep Debugging
For persistent issues, suggest checking system logs:
log stream --predicate 'subsystem == "com.vellum.assistant"' --level debug
Key log categories:
voice- PTT activation, recording statespeech- Speech recognition results
Rules
- Always handle setup conversationally in-chat. Do NOT tell the user to go to Settings for initial configuration.
- Use
navigate_settings_tabonly for review/verification after in-chat setup, not as the primary setup method. - Be concise. Don't explain every option exhaustively - present the most common choices and let the user ask for more.
- If a permission is denied, acknowledge it gracefully and explain what features won't work without it.