qwen-voice - SKILL.md Agent Skill

name: qwen-voice description: "Use Qwen (DashScope/百炼) for speech tasks: (1) ASR speech-to-text transcription of user audio/voice messages (Telegram .ogg opus, wav, mp3) using qwen3-asr-flash, optionally with coarse timestamps via chunking; (2) TTS text-to-speech voice reply using qwen3-tts-flash with selectable voice (default Cherry) and output as .ogg voice note for Telegram."

Use the bundled scripts. Prefer environment variable DASHSCOPE_API_KEY. If missing, scripts attempt to read it from ~/.bashrc.

python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg

python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg --timestamps --chunk-sec 3

Notes:

python3 skills/qwen-voice/scripts/qwen_tts.py --text '你好，我是 Pi。' --voice Cherry --out /tmp/out.ogg

python3 skills/qwen-voice/scripts/qwen_voice_clone.py --in ./voice_sample.ogg --name george --out work/qwen-voice/george.voice.json

python3 skills/qwen-voice/scripts/qwen_tts.py --text '你好，我是 George。' --voice-profile work/qwen-voice/george.voice.json --out /tmp/out.ogg

Notes:

When user sends voice message/audio: run ASR and reply with the transcribed text.
When user explicitly asks for voice reply: run TTS and send the generated .ogg as a voice note.