rubyllm-audio-transcription

star 12

Convert speech to text with RubyLLM. Use this skill for transcribing meetings, podcasts, lectures, interviews with support for speaker diarization, timestamps, and multiple languages.

MadBomber By MadBomber schedule Updated 6/13/2026

name: rubyllm-audio-transcription description: | Convert speech to text with RubyLLM. Use this skill for transcribing meetings, podcasts, lectures, interviews with support for speaker diarization, timestamps, and multiple languages.

RubyLLM Audio Transcription

Convert speech to text with support for multiple languages and speaker diarization.

v1.9.0+

Basic Usage

# Basic transcription
transcription = RubyLLM.transcribe("meeting.wav")
puts transcription.text

# With specific model
transcription = RubyLLM.transcribe("audio.mp3", model: 'whisper-1')

# Access metadata
puts "Model: #{transcription.model}"
puts "Duration: #{transcription.duration}s"
puts "Language: #{transcription.language}"

Models

Model Provider Price/min Best For
whisper-1 OpenAI $0.006 General use
gpt-4o-transcribe OpenAI Varies Technical content
gpt-4o-mini-transcribe OpenAI Varies Fast, cheap
gemini-2.5-flash Google Varies Long audio
# Whisper-1 (default)
RubyLLM.transcribe("audio.mp3", model: 'whisper-1')

# GPT-4o Transcribe (technical)
RubyLLM.transcribe("audio.mp3", model: 'gpt-4o-transcribe')

# GPT-4o Mini (fastest)
RubyLLM.transcribe("audio.mp3", model: 'gpt-4o-mini-transcribe')

# Diarization (speaker identification)
RubyLLM.transcribe("meeting.wav", model: 'gpt-4o-transcribe-diarize')

Speaker Diarization

Identify different speakers:

transcription = RubyLLM.transcribe(
  "meeting.wav",
  model: 'gpt-4o-transcribe-diarize'
)

transcription.speakers.each do |speaker|
  puts "Speaker #{speaker.id}:"
  puts speaker.text
  puts
end

Timestamps & Segments

transcription = RubyLLM.transcribe("meeting.wav")

transcription.segments.each do |segment|
  puts "#{format_time(segment.start)} - #{format_time(segment.end)}"
  puts "  #{segment.text}"
  puts "  Speaker: #{segment.speaker_id}" if segment.speaker_id
end

def format_time(seconds)
  mins = (seconds / 60).to_i
  secs = (seconds % 60).to_i
  format("%02d:%02d", mins, secs)
end

Word-Level Timing (v1.16+)

transcription = RubyLLM.transcribe("interview.mp3", model: "whisper-1")
transcription.words
# => [{ word: "Hello", start: 0.0, end: 0.42 }, { word: "world", start: 0.5, end: 0.78 }, ...]

Language Options

# Language hint (improves accuracy)
RubyLLM.transcribe("audio.mp3", language: 'en')

# Multiple languages
RubyLLM.transcribe("interview.mp3", language: 'es')

# Auto-detect (default)
RubyLLM.transcribe("audio.mp3")

Prompts

Provide context for better accuracy:

RubyLLM.transcribe(
  "technical-meeting.wav",
  prompt: 'This is a technical meeting about Ruby on Rails development'
)

RubyLLM.transcribe(
  "medical-lecture.wav",
  prompt: 'Medical terminology, cardiology lecture'
)

Supported Formats

MP3, M4A, WAV, WebM, OGG, FLAC

Chat with Audio

chat = RubyLLM.chat(model: 'gpt-4o-audio-preview')

# Transcribe
chat.ask "Transcribe this", with: 'meeting.mp3'

# Ask follow-up questions
chat.ask "What action items were discussed?"
chat.ask "Who said they would complete the feature by Friday?"

Rails Integration

class Meeting < ApplicationRecord
  has_one_attached :recording
  has_one_attached :transcript
  
  after_commit :transcribe, if: :recording_attached?
  
  private
  
  def transcribe
    TranscriptionJob.perform_later(self.id)
  end
end

class TranscriptionJob < ApplicationJob
  def perform(meeting_id)
    meeting = Meeting.find(meeting_id)
    
    # Download attachment
    file = meeting.recording.download
    Tempfile.create(['recording', '.mp3']) do |temp|
      temp.write(file)
      temp.rewind
      
      # Transcribe
      transcription = RubyLLM.transcribe(temp.path)
      
      # Save transcript
      meeting.transcript.attach(
        io: StringIO.new(transcription.text),
        filename: "transcript.txt",
        content_type: 'text/plain'
      )
      
      # Save segments as JSON
      meeting.update!(
        transcription_data: {
          segments: transcription.segments.map(&:to_h),
          speakers: transcription.speakers&.map(&:to_h),
          duration: transcription.duration,
          language: transcription.language
        }
      )
    end
  end
end

See Also

  • Main RubyLLM: rubyllm
  • Multi-Modal: Using audio in chat conversations
Install via CLI
npx skills add https://github.com/MadBomber/experiments --skill rubyllm-audio-transcription
Repository Details
star Stars 12
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator