rubyllm-audio-transcription

name: rubyllm-audio-transcription description: | Convert speech to text with RubyLLM. Use this skill for transcribing meetings, podcasts, lectures, interviews with support for speaker diarization, timestamps, and multiple languages.

RubyLLM Audio Transcription

Convert speech to text with support for multiple languages and speaker diarization.

v1.9.0+

Basic Usage

# Basic transcription
transcription = RubyLLM.transcribe("meeting.wav")
puts transcription.text

# With specific model
transcription = RubyLLM.transcribe("audio.mp3", model: 'whisper-1')

# Access metadata
puts "Model: #{transcription.model}"
puts "Duration: #{transcription.duration}s"
puts "Language: #{transcription.language}"

Models

Model	Provider	Price/min	Best For
whisper-1	OpenAI	$0.006	General use
gpt-4o-transcribe	OpenAI	Varies	Technical content
gpt-4o-mini-transcribe	OpenAI	Varies	Fast, cheap
gemini-2.5-flash	Google	Varies	Long audio

# Whisper-1 (default)
RubyLLM.transcribe("audio.mp3", model: 'whisper-1')

# GPT-4o Transcribe (technical)
RubyLLM.transcribe("audio.mp3", model: 'gpt-4o-transcribe')

# GPT-4o Mini (fastest)
RubyLLM.transcribe("audio.mp3", model: 'gpt-4o-mini-transcribe')

# Diarization (speaker identification)
RubyLLM.transcribe("meeting.wav", model: 'gpt-4o-transcribe-diarize')

Speaker Diarization

Identify different speakers:

transcription = RubyLLM.transcribe(
  "meeting.wav",
  model: 'gpt-4o-transcribe-diarize'
)

transcription.speakers.each do |speaker|
  puts "Speaker #{speaker.id}:"
  puts speaker.text
  puts
end

Timestamps & Segments

transcription = RubyLLM.transcribe("meeting.wav")

transcription.segments.each do |segment|
  puts "#{format_time(segment.start)} - #{format_time(segment.end)}"
  puts "  #{segment.text}"
  puts "  Speaker: #{segment.speaker_id}" if segment.speaker_id
end

def format_time(seconds)
  mins = (seconds / 60).to_i
  secs = (seconds % 60).to_i
  format("%02d:%02d", mins, secs)
end

Word-Level Timing (v1.16+)

transcription = RubyLLM.transcribe("interview.mp3", model: "whisper-1")
transcription.words
# => [{ word: "Hello", start: 0.0, end: 0.42 }, { word: "world", start: 0.5, end: 0.78 }, ...]

Language Options

# Language hint (improves accuracy)
RubyLLM.transcribe("audio.mp3", language: 'en')

# Multiple languages
RubyLLM.transcribe("interview.mp3", language: 'es')

# Auto-detect (default)
RubyLLM.transcribe("audio.mp3")

Prompts

Provide context for better accuracy:

RubyLLM.transcribe(
  "technical-meeting.wav",
  prompt: 'This is a technical meeting about Ruby on Rails development'
)

RubyLLM.transcribe(
  "medical-lecture.wav",
  prompt: 'Medical terminology, cardiology lecture'
)

Supported Formats

MP3, M4A, WAV, WebM, OGG, FLAC

Chat with Audio

chat = RubyLLM.chat(model: 'gpt-4o-audio-preview')

# Transcribe
chat.ask "Transcribe this", with: 'meeting.mp3'

# Ask follow-up questions
chat.ask "What action items were discussed?"
chat.ask "Who said they would complete the feature by Friday?"

Rails Integration

class Meeting < ApplicationRecord
  has_one_attached :recording
  has_one_attached :transcript
  
  after_commit :transcribe, if: :recording_attached?
  
  private
  
  def transcribe
    TranscriptionJob.perform_later(self.id)
  end
end

class TranscriptionJob < ApplicationJob
  def perform(meeting_id)
    meeting = Meeting.find(meeting_id)
    
    # Download attachment
    file = meeting.recording.download
    Tempfile.create(['recording', '.mp3']) do |temp|
      temp.write(file)
      temp.rewind
      
      # Transcribe
      transcription = RubyLLM.transcribe(temp.path)
      
      # Save transcript
      meeting.transcript.attach(
        io: StringIO.new(transcription.text),
        filename: "transcript.txt",
        content_type: 'text/plain'
      )
      
      # Save segments as JSON
      meeting.update!(
        transcription_data: {
          segments: transcription.segments.map(&:to_h),
          speakers: transcription.speakers&.map(&:to_h),
          duration: transcription.duration,
          language: transcription.language
        }
      )
    end
  end
end

name: rubyllm-audio-transcription description: | Convert speech to text with RubyLLM. Use this skill for transcribing meetings, podcasts, lectures, interviews with support for speaker diarization, timestamps, and multiple languages.

RubyLLM Audio Transcription

Basic Usage

Models

Speaker Diarization

Timestamps & Segments

Word-Level Timing (v1.16+)

Language Options

Prompts

Supported Formats

Chat with Audio

Rails Integration

See Also