name: transcribe description: Transcribe audio and video files using the configured speech-to-text provider compatibility: "Designed for Vellum personal assistants" metadata: emoji: "🎙️" vellum: display-name: "Transcribe" category: "voice" activation-hints: - "User has an audio or video file on disk they want converted to text" - "User wants speech-to-text on a recording, voice memo, podcast, or meeting capture" - "User asks for a transcript of a media file (mp3, wav, m4a, mp4, mov, etc.)"
Transcribe audio and video files using the configured speech-to-text provider. Supports multiple STT providers including OpenAI Whisper, Deepgram, and Google Gemini — the active provider is selected in Settings under Speech-to-Text (services.stt).
Usage Notes
- The tool accepts a
file_path(absolute path to a local audio or video file) to transcribe. - Supported formats: any video (mp4, mov, etc.) or audio (mp3, wav, m4a, etc.) file.
- For video files, audio is automatically extracted via ffmpeg before transcription.
- Large files are automatically split into chunks for processing.
- If no STT provider credentials are configured, the tool will return an error with setup instructions.
- The STT provider (
services.stt) is shared between transcription and telephony call paths.
Maintenance
When adding or modifying an STT provider, follow the onboarding checklist at assistant/docs/stt-provider-onboarding.md. That document covers the daemon catalog, config schema, adapter wiring, client catalog parity, and required tests.