voice-asr - SKILL.md Agent Skill

name: voice-asr description: Voice-to-text transcription using DashScope ASR. Adds real-time speech recognition for audio messages in OpenClaw. Supports Chinese, English, and multiple languages.

Voice ASR (语音转文字)

Real-time speech-to-text transcription using Alibaba DashScope ASR API.

Quick Start

1. Install Dependencies

pip install dashscope

2. Set API Key

# Windows PowerShell
$env:DASHSCOPE_API_KEY="sk-your-api-key"

# Or add to environment permanently

3. Configure OpenClaw

Add to your ~/.openclaw/openclaw.json:

{
  "tools": {
    "media": {
      "models": [
        {
          "type": "cli",
          "command": "python",
          "args": [
            "C:\\path\\to\\voice-asr\\scripts\\dashscope-realtime-asr.py",
            "{{MediaPath}}"
          ],
          "timeoutSeconds": 30
        }
      ],
      "audio": {
        "enabled": true,
        "maxChars": 4000,
        "attachments": {
          "mode": "first",
          "maxAttachments": 1
        }
      }
    }
  }
}

4. Restart Gateway

openclaw gateway restart

Usage

Automatic: Once configured, all voice messages in Discord/Telegram/etc. will be transcribed automatically.

Manual: Run the script directly:

python scripts/dashscope-realtime-asr.py path/to/audio.ogg

Supported Formats

OGG/Opus (Discord voice messages) - auto-converted to PCM
PCM (16kHz, mono, 16-bit)
WAV (will be converted if needed)

Language Support

Default: Chinese (zh)

To change language, edit the script's transcription_params:

transcription_params=TranscriptionParams(
    language='en',  # or 'zh', 'ja', 'ko', etc.
    sample_rate=16000,
    input_audio_format='pcm'
)

Troubleshooting

See references/TROUBLESHOOTING.md

Requirements

Python 3.8+
dashscope Python package
ffmpeg (for OGG conversion)
DashScope API key (free tier available)