whisper-realtime-stt - SKILL.md Agent Skill

name: whisper-realtime-stt description: Deploy OpenAI Whisper on NVIDIA Jetson Orin for real-time speech-to-text. Clones the deployment repo, installs dependencies including ffmpeg, tests the environment, and runs real-time STT from a USB microphone (e.g. reSpeaker). Includes Riva vs Whisper comparison context.

Real-Time Speech-to-Text with Whisper on Jetson Orin

Deploy Whisper on Jetson Orin for real-time speech-to-text processing directly on-device, eliminating network dependency and enhancing privacy. Uses a USB microphone for audio input.

Execution model

Run one phase at a time. After each phase:

Relay all output to the user.
If output contains [STOP] → stop immediately, consult the failure decision tree.
If output ends with [OK] → tell the user "Phase N complete" and proceed to the next phase.

Prerequisites

Requirement	Detail
Jetson device	reComputer or other Jetson Orin-based device
Microphone	reSpeaker Mic Array v2.0 or other USB microphone
JetPack	With CUDA support
Network	Internet access for cloning repo and installing packages

Phase 1 — Install dependencies (~5 min)

git clone https://github.com/LJ-Hao/Deploy-Whisper-on-NVIDIA-Jetson-Orin-for-Real-time-Speech-to-Text.git
cd Deploy-Whisper-on-NVIDIA-Jetson-Orin-for-Real-time-Speech-to-Text
pip install -r requirements.txt
sudo apt update && sudo apt install ffmpeg

Configure the microphone sample rate:

arecord -D hw:2,0 --dump-hw-params

[OK] when all packages install and ffmpeg is available. [STOP] if pip or apt install fails.

Phase 2 — Test environment (~1 min)

python test.py

Verify ffmpeg is installed:

ffmpeg -version

[OK] when test.py prints successful library import messages and ffmpeg -version shows version info. [STOP] if imports fail or ffmpeg is not found.

Phase 3 — Run real-time speech-to-text

python main.py

Speak into the microphone and observe real-time transcription output.

[OK] when transcription appears in the terminal as you speak. [STOP] if audio device errors or model loading fails.

Failure decision tree

Symptom	Action
`pip install -r requirements.txt` fails	Check Python version ≥ 3.8. Try `pip install --upgrade pip` first.
`ffmpeg` not found after install	Run `sudo apt install ffmpeg` again. Verify with `which ffmpeg`.
`arecord` — no soundcard found	Check USB microphone connection. Run `arecord -l` to list devices. Adjust device ID (`hw:X,0`).
`test.py` import errors	Re-run `pip install -r requirements.txt`. Check for missing system libraries.
`main.py` — CUDA out of memory	Close other GPU processes. Use a smaller Whisper model variant.
`main.py` — no audio input	Verify microphone with `arecord -D hw:2,0 -f S16_LE -r 16000 -d 5 test.wav`.
Poor transcription accuracy	Ensure microphone sample rate is 16000 Hz. Reduce background noise.

Reference files

references/source.body.md — Full Seeed Wiki tutorial with hardware setup photos, environment test screenshots, Riva vs Whisper comparison video, and project outlook (reference only)