whisper-realtime-stt

star 50

Deploy OpenAI Whisper on NVIDIA Jetson Orin for real-time speech-to-text. Clones the deployment repo, installs dependencies including ffmpeg, tests the environment, and runs real-time STT from a USB microphone (e.g. reSpeaker). Includes Riva vs Whisper comparison context.

Seeed-Projects By Seeed-Projects schedule Updated 3/11/2026

name: whisper-realtime-stt description: Deploy OpenAI Whisper on NVIDIA Jetson Orin for real-time speech-to-text. Clones the deployment repo, installs dependencies including ffmpeg, tests the environment, and runs real-time STT from a USB microphone (e.g. reSpeaker). Includes Riva vs Whisper comparison context.

Real-Time Speech-to-Text with Whisper on Jetson Orin

Deploy Whisper on Jetson Orin for real-time speech-to-text processing directly on-device, eliminating network dependency and enhancing privacy. Uses a USB microphone for audio input.


Execution model

Run one phase at a time. After each phase:

  • Relay all output to the user.
  • If output contains [STOP] → stop immediately, consult the failure decision tree.
  • If output ends with [OK] → tell the user "Phase N complete" and proceed to the next phase.

Prerequisites

Requirement Detail
Jetson device reComputer or other Jetson Orin-based device
Microphone reSpeaker Mic Array v2.0 or other USB microphone
JetPack With CUDA support
Network Internet access for cloning repo and installing packages

Phase 1 — Install dependencies (~5 min)

git clone https://github.com/LJ-Hao/Deploy-Whisper-on-NVIDIA-Jetson-Orin-for-Real-time-Speech-to-Text.git
cd Deploy-Whisper-on-NVIDIA-Jetson-Orin-for-Real-time-Speech-to-Text
pip install -r requirements.txt
sudo apt update && sudo apt install ffmpeg

Configure the microphone sample rate:

arecord -D hw:2,0 --dump-hw-params

[OK] when all packages install and ffmpeg is available. [STOP] if pip or apt install fails.


Phase 2 — Test environment (~1 min)

python test.py

Verify ffmpeg is installed:

ffmpeg -version

[OK] when test.py prints successful library import messages and ffmpeg -version shows version info. [STOP] if imports fail or ffmpeg is not found.


Phase 3 — Run real-time speech-to-text

python main.py

Speak into the microphone and observe real-time transcription output.

[OK] when transcription appears in the terminal as you speak. [STOP] if audio device errors or model loading fails.


Failure decision tree

Symptom Action
pip install -r requirements.txt fails Check Python version ≥ 3.8. Try pip install --upgrade pip first.
ffmpeg not found after install Run sudo apt install ffmpeg again. Verify with which ffmpeg.
arecord — no soundcard found Check USB microphone connection. Run arecord -l to list devices. Adjust device ID (hw:X,0).
test.py import errors Re-run pip install -r requirements.txt. Check for missing system libraries.
main.py — CUDA out of memory Close other GPU processes. Use a smaller Whisper model variant.
main.py — no audio input Verify microphone with arecord -D hw:2,0 -f S16_LE -r 16000 -d 5 test.wav.
Poor transcription accuracy Ensure microphone sample rate is 16000 Hz. Reduce background noise.

Reference files

  • references/source.body.md — Full Seeed Wiki tutorial with hardware setup photos, environment test screenshots, Riva vs Whisper comparison video, and project outlook (reference only)
Install via CLI
npx skills add https://github.com/Seeed-Projects/Seeed-Jetson-DevelopTool --skill whisper-realtime-stt
Repository Details
star Stars 50
call_split Forks 3
navigation Branch main
article Path SKILL.md
More from Creator
Seeed-Projects
Seeed-Projects Explore all skills →