transcribe - SKILL.md Agent Skill

name: transcribe description: Transcribe audio and video files using the configured speech-to-text provider compatibility: "Designed for Vellum personal assistants" metadata: emoji: "🎙️" vellum: display-name: "Transcribe" category: "voice" activation-hints: - "User has an audio or video file on disk they want converted to text" - "User wants speech-to-text on a recording, voice memo, podcast, or meeting capture" - "User asks for a transcript of a media file (mp3, wav, m4a, mp4, mov, etc.)"

Transcribe audio and video files using the configured speech-to-text provider. Supports multiple STT providers including OpenAI Whisper, Deepgram, and Google Gemini — the active provider is selected in Settings under Speech-to-Text (services.stt).

Usage Notes

The tool accepts a file_path (absolute path to a local audio or video file) to transcribe.
Supported formats: any video (mp4, mov, etc.) or audio (mp3, wav, m4a, etc.) file.
For video files, audio is automatically extracted via ffmpeg before transcription.
Large files are automatically split into chunks for processing.
If no STT provider credentials are configured, the tool will return an error with setup instructions.
The STT provider (services.stt) is shared between transcription and telephony call paths.

Maintenance

When adding or modifying an STT provider, follow the onboarding checklist at assistant/docs/stt-provider-onboarding.md. That document covers the daemon catalog, config schema, adapter wiring, client catalog parity, and required tests.