name: whisper-voice
description: Native macOS menu bar app for live voice-to-text with auto-type using WhisperKit on Apple Silicon
Whisper Voice — Live Speech-to-Text Mac App
Goal
Build and run a native macOS menu bar app that captures live microphone audio, transcribes it offline using WhisperKit (on Apple Silicon), and auto-types the text wherever the cursor is.
Inputs
| Name |
Type |
Required |
Description |
| model_size |
string |
No |
Whisper model: tiny, base (default), small |
| language |
string |
No |
"en" (default) or "hi" for Hindi mode |
| chunk_duration |
float |
No |
Seconds per audio chunk (default: 3.0) |
Process
1. Build the app
cd AiwithDhruv_Voice/WhisperAiwithDhruv
swift build
2. Run the app
swift run WhisperAiwithDhruv
# Or open in Xcode: open Package.swift → Cmd+R
3. First launch setup
- Grant microphone permission when prompted
- Grant Accessibility in System Settings → Privacy → Accessibility
- Wait for model download (~140MB for base model)
4. Usage
- Cmd+Shift+Space — Toggle recording on/off
- Click mic icon in menu bar for controls
- Speak — text auto-types at cursor position
- Toggle Hindi mode for Hindi/Hinglish input
Outputs
| Name |
Type |
Description |
| transcribed_text |
string |
Live transcribed text typed at cursor |
| history |
array |
Last 50 transcription entries in menu bar |
Edge Cases
- No mic: Shows error in menu bar dropdown
- Accessibility denied: Auto-type disabled, manual copy from history
- Silence: VAD skips silent chunks (energy-based threshold)
- Hallucinations: Filters common Whisper artifacts ("Thank you.", "...")
- Model not downloaded: Shows download progress bar
Environment
- macOS 14+ (Sonoma)
- Apple Silicon (M1/M2/M3/M4)
- Xcode 15+ (for building)
- No API keys needed (fully offline)
Schema
Inputs
| Name |
Type |
Required |
Description |
| model_size |
string |
No |
tiny / base / small |
| language |
string |
No |
en / hi |
| chunk_duration |
float |
No |
2.0 - 8.0 seconds |
| silence_threshold |
float |
No |
0.002 - 0.05 |
Outputs
| Name |
Type |
Description |
| transcription |
string |
Live text output |
| auto_typed |
boolean |
Whether text was injected at cursor |
Credentials
| Name |
Source |
| None |
Fully offline, no API keys |
Composable With
video-edit (add transcription captions), send-telegram (send transcriptions to phone)
Cost
Free — runs entirely on-device. Model download is one-time (~140MB for base).