whisper-transcribe

star 55

Transcribe audio and video files to text using the Whisper speech-to-text API at {{WHISPER_HOST}}:{{WHISPER_PORT}}.

bidewio By bidewio schedule Updated 2/7/2026

name: whisper-transcribe description: "Transcribe audio and video files to text using the Whisper speech-to-text API at {{WHISPER_HOST}}:{{WHISPER_PORT}}." metadata: openclaw: emoji: "🎙️"


Whisper Transcribe Skill

Whisper speech-to-text service is available at http://{{WHISPER_HOST}}:{{WHISPER_PORT}} within the Docker network.

Basic Transcription

Transcribe an audio file to text:

curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=json"

Transcription with Options

Specify language, task, and output format:

# Transcribe with language hint
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "language=en" \
  -F "output=json"

# Translate non-English audio to English
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/french_audio.mp3" \
  -F "task=translate" \
  -F "output=json"

Output Formats

Choose between different output formats:

# Plain text output
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=txt"

# JSON with timestamps
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=json"

# SRT subtitles
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=srt"

# VTT subtitles
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=vtt"

# Verbose JSON with word-level timestamps
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=verbose_json" \
  -F "word_timestamps=true"

Preparing Audio with FFmpeg

For best results, convert audio to the optimal format before sending to Whisper:

# Convert any media to 16kHz mono WAV (optimal for Whisper)
ffmpeg -i /data/input/video.mp4 \
  -vn -ar 16000 -ac 1 -c:a pcm_s16le \
  /data/input/audio_for_whisper.wav

# Extract and convert specific time range
ffmpeg -i /data/input/long_recording.mp3 \
  -ss 00:05:00 -to 00:10:00 \
  -vn -ar 16000 -ac 1 -c:a pcm_s16le \
  /data/input/segment.wav

Saving Transcription Output

# Save transcription to file
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=json" \
  -o /data/output/transcription.json

# Save subtitles
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=srt" \
  -o /data/output/subtitles.srt

Response Structure (JSON)

{
  "text": "The full transcription text goes here.",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 4.5,
      "text": "The full transcription",
      "avg_logprob": -0.25,
      "no_speech_prob": 0.01
    },
    {
      "id": 1,
      "start": 4.5,
      "end": 7.2,
      "text": " text goes here.",
      "avg_logprob": -0.18,
      "no_speech_prob": 0.02
    }
  ],
  "language": "en"
}

Supported Audio Formats

  • WAV (recommended for best quality)
  • MP3
  • FLAC
  • OGG
  • M4A
  • WebM

Tips for AI Agents

  • Always convert input audio to 16kHz mono WAV for the best accuracy and fastest processing.
  • Use language parameter when you know the source language to improve accuracy and speed.
  • Use task=translate to translate any language directly to English text.
  • For long audio (>30 minutes), split into segments with FFmpeg first, then transcribe each segment.
  • Check no_speech_prob in segments to filter out silence or noise sections.
  • Use output=verbose_json with word_timestamps=true when you need precise timing for subtitle sync.
  • SRT/VTT output formats are ready to use as video subtitles without additional processing.
  • Check Whisper service health at http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/ or /health.
Install via CLI
npx skills add https://github.com/bidewio/better-openclaw --skill whisper-transcribe
Repository Details
star Stars 55
call_split Forks 8
navigation Branch main
article Path SKILL.md
More from Creator