transcriber

name: transcriber description: AI-powered audio and video transcription using OpenAI Whisper or AssemblyAI. Use when converting recordings to text, generating subtitles, or creating searchable transcripts. version: 1.0.0

Note: Review PROFILE.md for user-specific transcription preferences, provider settings, and formatting options.

Master Briefing: Global brand voice at ~/.superskills/master-briefing.yaml applies automatically. Skill profile overrides when conflicts exist.

Convert audio and video to accurate text transcripts with timestamps, perfect for creating course materials, show notes, and searchable content.

Tools

Transcriber.py (in src/):

Multi-provider support (OpenAI Whisper, AssemblyAI)
Word-level timestamps for precise timing
Multiple output formats (TXT, JSON, SRT, VTT)
Batch processing for multiple files
Key quote extraction for marketing
Automatic language detection
Confidence scoring

Core Workflow

1. File Preparation

Receive audio/video file(s)
Validate file format and size
Determine output requirements (format, timestamps)
Select transcription provider

2. Transcription

Upload and process file through API
Capture word-level timestamps if needed
Detect language automatically
Extract metadata (duration, word count)
Generate confidence scores

3. Delivery

Export in requested format (TXT, JSON, SRT, VTT)
Extract key quotes if needed
Package with metadata
Handoff to downstream workflows (narrator, coursepackager, author)

Usage

Basic Transcription:

from superskills.transcriber.src import transcribe_file

result = transcribe_file("recording.mp3")
print(result.transcript)
print(f"Duration: {result.duration_seconds}s")
print(f"Words: {result.word_count}")

With Timestamps:

from superskills.transcriber.src import Transcriber

transcriber = Transcriber(provider="openai")
result = transcriber.transcribe(
    "session.mp4",
    include_timestamps=True,
    output_format="srt"
)
print(f"Saved to: {result.output_file}")

Batch Processing:

files = ["session1.mp3", "session2.mp3", "session3.mp3"]
results = transcriber.transcribe_batch(files, output_format="json")

for result in results:
    print(f"{result.source_file}: {result.word_count} words")

Extract Marketing Quotes:

result = transcriber.transcribe("podcast.mp3", include_timestamps=True)
quotes = transcriber.extract_key_quotes(result, min_words=15, max_quotes=5)

for quote in quotes:
    print(quote)

Output Formats

TXT: Plain text transcript
JSON: Full metadata including timestamps and confidence
SRT: Standard subtitle format for video players
VTT: WebVTT format for web video

Environment Variables

# OpenAI Whisper (recommended)
OPENAI_API_KEY=your_openai_api_key

# Or AssemblyAI (alternative)
ASSEMBLYAI_API_KEY=your_assemblyai_api_key

Global .env (repository root):

echo "OPENAI_API_KEY=sk-your-key" >> .env

Or skill-specific .env:

echo "OPENAI_API_KEY=sk-your-key" >> superskills/transcriber/.env

Quality Checklist

Audio quality sufficient (clear speech, minimal background noise)
File size within limits (OpenAI: <25MB)
Language correctly detected or specified
Timestamps accurate if requested
Confidence scores reviewed
Output format correct for use case
Metadata captured (duration, word count)

Avoid

Poor Audio Quality: Accept any file → Pre-check audio quality and clarity
Missing Context: Generic transcription → Specify language/domain for better accuracy
Wrong Format: Text only → Use SRT/VTT for video subtitles
Ignoring Timestamps: Plain text → Capture timestamps for editing/navigation
Large Files: Single upload → Split files >25MB or use AssemblyAI

Escalate When

Audio quality too poor for accurate transcription
Multiple speakers need identification (diarization)
Technical jargon requires custom vocabulary
File size exceeds API limits
Budget constraints require cost optimization

Integration Examples

With Narrator (Podcast Transcripts):

from superskills.narrator.src import PodcastGenerator
from superskills.transcriber.src import Transcriber

podcast = PodcastGenerator()
result = podcast.generate_podcast(segments, "episode.mp3")

transcriber = Transcriber()
transcript = transcriber.transcribe("episode.mp3", output_format="txt")
print(transcript.transcript)

With Marketer (Social Snippets):

from superskills.transcriber.src import Transcriber
from superskills.marketer.src import SocialMediaPublisher

transcriber = Transcriber()
result = transcriber.transcribe("training.mp4", include_timestamps=True)

quotes = transcriber.extract_key_quotes(result, min_words=15, max_quotes=3)

publisher = SocialMediaPublisher()
for quote in quotes:
    publisher.schedule_post(quote, platforms=["TWITTER", "LINKEDIN"])

With CoursePackager (Searchable Transcripts):

from superskills.transcriber.src import Transcriber

transcriber = Transcriber()
results = transcriber.transcribe_batch(
    ["lesson1.mp4", "lesson2.mp4", "lesson3.mp4"],
    output_format="json"
)