name: voice-asr description: Voice-to-text transcription using DashScope ASR. Adds real-time speech recognition for audio messages in OpenClaw. Supports Chinese, English, and multiple languages.
Voice ASR (语音转文字)
Real-time speech-to-text transcription using Alibaba DashScope ASR API.
Quick Start
1. Install Dependencies
pip install dashscope
2. Set API Key
# Windows PowerShell
$env:DASHSCOPE_API_KEY="sk-your-api-key"
# Or add to environment permanently
3. Configure OpenClaw
Add to your ~/.openclaw/openclaw.json:
{
"tools": {
"media": {
"models": [
{
"type": "cli",
"command": "python",
"args": [
"C:\\path\\to\\voice-asr\\scripts\\dashscope-realtime-asr.py",
"{{MediaPath}}"
],
"timeoutSeconds": 30
}
],
"audio": {
"enabled": true,
"maxChars": 4000,
"attachments": {
"mode": "first",
"maxAttachments": 1
}
}
}
}
}
4. Restart Gateway
openclaw gateway restart
Usage
Automatic: Once configured, all voice messages in Discord/Telegram/etc. will be transcribed automatically.
Manual: Run the script directly:
python scripts/dashscope-realtime-asr.py path/to/audio.ogg
Supported Formats
- OGG/Opus (Discord voice messages) - auto-converted to PCM
- PCM (16kHz, mono, 16-bit)
- WAV (will be converted if needed)
Language Support
Default: Chinese (zh)
To change language, edit the script's transcription_params:
transcription_params=TranscriptionParams(
language='en', # or 'zh', 'ja', 'ko', etc.
sample_rate=16000,
input_audio_format='pcm'
)
Troubleshooting
See references/TROUBLESHOOTING.md
Requirements
- Python 3.8+
dashscopePython packageffmpeg(for OGG conversion)- DashScope API key (free tier available)