name: hypura-voice-io description: Use Hypura Harness voice input and output tools for local mic, WAV transcription, VOICEVOX playback, and Desktop Companion voice turns. category: automation, voice, audio, companion version: 1.0.0 user-invocable: false
Hypura Harness Voice I/O
Use this skill when an agent needs local voice input, local voice output, or a
Desktop Companion transcript loop through the bundled hypura-harness plugin.
The harness daemon owns audio devices, VOICEVOX synthesis, whisper.cpp
transcription, and the local OpenClaw agent command bridge.
Default VOICEVOX speech should use Kasukabe Tsumugi speaker 8 unless the user
selects a different voice.
Before you begin
- Start the harness with
scripts/launchers/Start-Hypura-Harness.ps1or fromextensions/hypura-harness/scriptswithuv run harness_daemon.py. - Check
hypura_harness_statusbefore using voice tools. - Run
hypura_harness_voice_devicesbefore selectinginput_device,output_device, oroutput_devices. - Use explicit device ids when routing to speakers, VB-Cable, or mixed monitor plus virtual-cable output.
- Do not start microphone recording unless the user asked for live voice input or an operator has already started the voice session.
Tool map
| Need | Tool | Harness endpoint |
|---|---|---|
| Inspect local audio devices | hypura_harness_voice_devices |
GET /voice/devices |
| Test TTS output routing | hypura_harness_voice_test_say |
POST /voice/test-say |
| Transcribe a WAV file | hypura_harness_voice_transcribe |
POST /voice/transcribe |
| Record mic, run OpenClaw, speak reply | hypura_harness_voice_turn |
POST /voice/turn |
| Enable or disable Companion mic capture | hypura_harness_companion_mic |
POST /voice/companion-mic |
| Handle Companion transcript as an agent turn | hypura_harness_companion_voice_turn |
POST /voice/companion-turn |
| Inspect or update Companion state/permission | hypura_harness_companion |
POST /companion/control |
Recommended workflow
- Confirm daemon health.
hypura_harness_status({})
- List devices and choose explicit ids.
hypura_harness_voice_devices({})
- Test output before recording input.
hypura_harness_voice_test_say({
"text": "Voice output test.",
"speaker": 8,
"output_devices": [5, 4]
})
- For file input, transcribe the WAV first.
hypura_harness_voice_transcribe({
"wav_path": "path/to/input.wav"
})
- For a live mic turn, keep the capture short and route the reply explicitly.
hypura_harness_voice_turn({
"record_seconds": 5,
"input_device": 1,
"output_devices": [5, 4],
"speaker": 8,
"openclaw_timeout": 240
})
Desktop Companion transcript loop
Use the Companion path when the Desktop Companion already captures the transcript and the harness only needs to hand that text to OpenClaw.
Grant mic permission, then enable mic capture:
hypura_harness_companion({
"action": "permission",
"capability": "mic",
"decision": "granted"
})
hypura_harness_companion_mic({ "enabled": true })
Process an explicit transcript:
hypura_harness_companion_voice_turn({
"transcript": "What changed in the workspace?",
"transcript_timestamp": 1778918400000,
"last_seen_timestamp": 1778918399000,
"speak": true,
"animate": true
})
If transcript is omitted, the harness reads the latest transcript from the
Desktop Companion state file. Pass last_seen_timestamp when polling so the
same transcript is not handled twice.
When animate is true, the harness forwards the inferred emotion to the
Desktop Companion before speech. The companion maps common emotions to VRM/FBX
procedural motion even when the loaded model has no animation clips.
For direct Desktop Companion control outside the Hypura loop, use
control_companion or hypura_harness_companion with status, permission,
mic, input_snapshot, and window_capture. Both paths read the same local transcript state and
should follow the same explicit-consent rule before enabling microphone capture.
After companion speech or animation, read status and inspect state.avatar
for the renderer-reported lastAction, lastEmotion, lastMotion,
lastExpression, and lastSpeechAt fields before treating the local animation
path as verified.
Use window_capture when the operator needs a local image proof of the
renderer window after a speech or motion command.
The Hypura SDK bridge does not grant microphone permission implicitly; if
mic returns success=false or nested micResult.ok=false, stop and surface
the local permission/device failure instead of retrying silently.
For speech output, pass emotion on control_companion(action="speak") or
hypura_harness_companion(action="speak") so the Desktop Companion animates and
speaks through one SDK request. Prefer VOICEVOX Kasukabe Tsumugi (speaker=8)
for local speech. Use tts_provider="web-speech" only as a local fallback when
VOICEVOX is not available.
hypura_harness_companion({
"action": "speak",
"value": "Local companion speech check.",
"emotion": "happy",
"tts_provider": "voicevox"
})
Troubleshooting
- Empty or stale input: call
hypura_harness_voice_devices, verify the selected input id, then retry with a shortrecord_secondsvalue. - No audio output: run
hypura_harness_voice_test_saywith one output device at a time, then add the second output device after the route is confirmed. - Whisper path errors: pass
whisper_exeandwhisper_model, or install the expected whisper.cpp runtime before usinghypura_harness_voice_turn. - VOICEVOX errors: start VOICEVOX and confirm
hypura_harness_statusreports a healthy VOICEVOX connection. - OpenClaw command errors: check
voice.openclaw_commandinextensions/hypura-harness/config/harness.config.json. The command template must contain{message}.