transcriber

star 1

AI-powered audio and video transcription using OpenAI Whisper or AssemblyAI. Use when converting recordings to text, generating subtitles, or creating searchable transcripts.

CoachSteff By CoachSteff schedule Updated 2/2/2026

name: transcriber description: AI-powered audio and video transcription using OpenAI Whisper or AssemblyAI. Use when converting recordings to text, generating subtitles, or creating searchable transcripts. version: 1.0.0

Transcriber

Note: Review PROFILE.md for user-specific transcription preferences, provider settings, and formatting options.

Master Briefing: Global brand voice at ~/.superskills/master-briefing.yaml applies automatically. Skill profile overrides when conflicts exist.

Convert audio and video to accurate text transcripts with timestamps, perfect for creating course materials, show notes, and searchable content.

Tools

Transcriber.py (in src/):

  • Multi-provider support (OpenAI Whisper, AssemblyAI)
  • Word-level timestamps for precise timing
  • Multiple output formats (TXT, JSON, SRT, VTT)
  • Batch processing for multiple files
  • Key quote extraction for marketing
  • Automatic language detection
  • Confidence scoring

Core Workflow

1. File Preparation

  • Receive audio/video file(s)
  • Validate file format and size
  • Determine output requirements (format, timestamps)
  • Select transcription provider

2. Transcription

  • Upload and process file through API
  • Capture word-level timestamps if needed
  • Detect language automatically
  • Extract metadata (duration, word count)
  • Generate confidence scores

3. Delivery

  • Export in requested format (TXT, JSON, SRT, VTT)
  • Extract key quotes if needed
  • Package with metadata
  • Handoff to downstream workflows (narrator, coursepackager, author)

Usage

Basic Transcription:

from superskills.transcriber.src import transcribe_file

result = transcribe_file("recording.mp3")
print(result.transcript)
print(f"Duration: {result.duration_seconds}s")
print(f"Words: {result.word_count}")

With Timestamps:

from superskills.transcriber.src import Transcriber

transcriber = Transcriber(provider="openai")
result = transcriber.transcribe(
    "session.mp4",
    include_timestamps=True,
    output_format="srt"
)
print(f"Saved to: {result.output_file}")

Batch Processing:

files = ["session1.mp3", "session2.mp3", "session3.mp3"]
results = transcriber.transcribe_batch(files, output_format="json")

for result in results:
    print(f"{result.source_file}: {result.word_count} words")

Extract Marketing Quotes:

result = transcriber.transcribe("podcast.mp3", include_timestamps=True)
quotes = transcriber.extract_key_quotes(result, min_words=15, max_quotes=5)

for quote in quotes:
    print(quote)

Output Formats

TXT: Plain text transcript
JSON: Full metadata including timestamps and confidence
SRT: Standard subtitle format for video players
VTT: WebVTT format for web video

Environment Variables

# OpenAI Whisper (recommended)
OPENAI_API_KEY=your_openai_api_key

# Or AssemblyAI (alternative)
ASSEMBLYAI_API_KEY=your_assemblyai_api_key

Global .env (repository root):

echo "OPENAI_API_KEY=sk-your-key" >> .env

Or skill-specific .env:

echo "OPENAI_API_KEY=sk-your-key" >> superskills/transcriber/.env

Quality Checklist

  • Audio quality sufficient (clear speech, minimal background noise)
  • File size within limits (OpenAI: <25MB)
  • Language correctly detected or specified
  • Timestamps accurate if requested
  • Confidence scores reviewed
  • Output format correct for use case
  • Metadata captured (duration, word count)

Avoid

  • Poor Audio Quality: Accept any file → Pre-check audio quality and clarity
  • Missing Context: Generic transcription → Specify language/domain for better accuracy
  • Wrong Format: Text only → Use SRT/VTT for video subtitles
  • Ignoring Timestamps: Plain text → Capture timestamps for editing/navigation
  • Large Files: Single upload → Split files >25MB or use AssemblyAI

Escalate When

  • Audio quality too poor for accurate transcription
  • Multiple speakers need identification (diarization)
  • Technical jargon requires custom vocabulary
  • File size exceeds API limits
  • Budget constraints require cost optimization

Integration Examples

With Narrator (Podcast Transcripts):

from superskills.narrator.src import PodcastGenerator
from superskills.transcriber.src import Transcriber

podcast = PodcastGenerator()
result = podcast.generate_podcast(segments, "episode.mp3")

transcriber = Transcriber()
transcript = transcriber.transcribe("episode.mp3", output_format="txt")
print(transcript.transcript)

With Marketer (Social Snippets):

from superskills.transcriber.src import Transcriber
from superskills.marketer.src import SocialMediaPublisher

transcriber = Transcriber()
result = transcriber.transcribe("training.mp4", include_timestamps=True)

quotes = transcriber.extract_key_quotes(result, min_words=15, max_quotes=3)

publisher = SocialMediaPublisher()
for quote in quotes:
    publisher.schedule_post(quote, platforms=["TWITTER", "LINKEDIN"])

With CoursePackager (Searchable Transcripts):

from superskills.transcriber.src import Transcriber

transcriber = Transcriber()
results = transcriber.transcribe_batch(
    ["lesson1.mp4", "lesson2.mp4", "lesson3.mp4"],
    output_format="json"
)
Install via CLI
npx skills add https://github.com/CoachSteff/superskills --skill transcriber
Repository Details
star Stars 1
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator