rubyllm

star 12

One beautiful Ruby API for GPT, Claude, Gemini, and more. Use this skill when building AI-powered applications with RubyLLM - chatbots, AI agents, RAG applications, content generators, vision/audio analysis, embeddings, image generation, and Rails integration. Supports 15+ providers with a unified interface. v1.16 adds concurrent tool execution (threads or fibers), built-in instrumentation, and per-provider API base URL overrides.

MadBomber By MadBomber schedule Updated 6/13/2026

name: rubyllm version: 1.16.0 description: | One beautiful Ruby API for GPT, Claude, Gemini, and more. Use this skill when building AI-powered applications with RubyLLM - chatbots, AI agents, RAG applications, content generators, vision/audio analysis, embeddings, image generation, and Rails integration. Supports 15+ providers with a unified interface. v1.16 adds concurrent tool execution (threads or fibers), built-in instrumentation, and per-provider API base URL overrides. allowed-tools: - Bash(bundle *) - Bash(bin/rails *)

RubyLLM v1.16.0

One beautiful Ruby API for GPT, Claude, Gemini, and more.

RubyLLM provides a unified interface for working with AI models across 15+ providers. Build chatbots, AI agents, RAG applications, and content generators with the same simple API.

Gem Version: 1.15.0

Installation

Ruby Project

bundle add ruby_llm
# config/initializers/ruby_llm.rb
RubyLLM.configure do |config|
  config.openai_api_key = ENV['OPENAI_API_KEY']
  config.anthropic_api_key = ENV['ANTHROPIC_API_KEY']
  config.gemini_api_key = ENV['GEMINI_API_KEY']
end

Rails Integration

bundle add ruby_llm
bin/rails generate ruby_llm:install
bin/rails db:migrate
bin/rails ruby_llm:load_models

# Optional: Add chat UI
bin/rails generate ruby_llm:chat_ui
# Visit http://localhost:3000/chats

Quick Start

# Basic chat
chat = RubyLLM.chat
response = chat.ask "What is Ruby on Rails?"

# Stream responses
chat.ask "Write a story" do |chunk|
  print chunk.content
end

# With tools
class Weather < RubyLLM::Tool
  description "Get weather"
  param :city, desc: "City name"
  def execute(city:)
    "Sunny in #{city}"
  end
end

chat.with_tool(Weather).ask "Weather in Paris?"

Core Features

Feature Skill Reference
Chat This skill (below)
Tools tools
Agents agents
Streaming This skill (Streaming section)
Embeddings embeddings
Image Generation image-generation
Image Editing image-generation (v1.15+)
Audio Transcription audio-transcription
Moderation moderation
Extended Thinking This skill (Extended Thinking section)
Rails Integration rails

Ecosystem Gems

Gem Version Skill Reference
ruby_llm-schema 0.3.0 schema
ruby_llm-mcp 1.0.0 mcp
ruby_llm-instrumentation 0.3.1 instrumentation
ruby_llm-monitoring 0.3.2 monitoring
ruby_llm-red_candle 0.2.0 red_candle
ruby_llm-tribunal 0.1.1 tribunal
opentelemetry-instrumentation-ruby_llm 0.4.0 opentelemetry

Chat API

# Create chat
chat = RubyLLM.chat(model: 'gpt-5.4')

# Ask question
response = chat.ask "What is Ruby?"

# Continue conversation (remembers context)
chat.ask "Show me an example"

# Access messages
chat.messages.each do |msg|
  puts "[#{msg.role}] #{msg.content}"
end

# System instructions
chat.with_instructions "You are a Ruby expert. Be concise."

Providers

RubyLLM supports 15+ providers through a unified API:

Provider Models Vision Tools Audio
OpenAI GPT-4, GPT-4o, o1, o3
Anthropic Claude 3/4
Google Gemini 1.5/2.0/2.5/3.0
xAI Grok-1/2/3
AWS Bedrock Claude, Llama, Titan
Ollama Local models
OpenRouter 300+ models
Perplexity Search models
Mistral Mistral/Mixtral
DeepSeek DeepSeek-V3
VertexAI Google Cloud
GPUStack Self-hosted
Azure OpenAI Enterprise OpenAI

Provider Setup

RubyLLM.configure do |config|
  # API keys
  config.openai_api_key = ENV['OPENAI_API_KEY']
  config.anthropic_api_key = ENV['ANTHROPIC_API_KEY']
  config.gemini_api_key = ENV['GEMINI_API_KEY']
  config.xai_api_key = ENV['XAI_API_KEY']
  config.perplexity_api_key = ENV['PERPLEXITY_API_KEY']
  config.mistral_api_key = ENV['MISTRAL_API_KEY']
  config.deepseek_api_key = ENV['DEEPSEEK_API_KEY']

  # Per-provider API base URL overrides (v1.16+)
  config.bedrock_api_base    = ENV['BEDROCK_API_BASE']
  config.mistral_api_base    = ENV['MISTRAL_API_BASE']
  config.perplexity_api_base = ENV['PERPLEXITY_API_BASE']
  config.vertexai_api_base   = ENV['VERTEXAI_API_BASE']
  config.xai_api_base        = ENV['XAI_API_BASE']
  # OpenAI, Anthropic, Gemini, DeepSeek, OpenRouter, Azure, Ollama, GPUStack also supported

  # Concurrent tool execution (v1.16+): true/:threads, :fibers, or false
  config.tool_concurrency = true

  # HTTP adapter (v1.16+): :async_http, :typhoeus, :net_http, :httpx, etc.
  config.faraday_adapter = :async_http

  # Deprecation warnings (v1.16+): :warn, :silence, or :raise
  config.deprecation_behavior = :warn

  # Built-in instrumentation (v1.16+, non-Rails)
  # config.instrumenter = MyInstrumenter.new  # must respond to instrument(name, payload) { }
end

Model Selection

# By model ID
chat = RubyLLM.chat(model: 'claude-sonnet-4-6')

# By provider routing
chat = RubyLLM.chat(model: 'claude-sonnet-4-6', provider: 'bedrock')

# Model registry
RubyLLM.models.supporting(:vision)
RubyLLM.models.find('gpt-5.4')

Prompt Caching (Anthropic)

raw_block = RubyLLM::Content::Raw.new([
  { 
    type: 'text', 
    text: File.read('large_document.txt'),
    cache_control: { type: 'ephemeral' }
  }
])

chat.add_message(role: :system, content: raw_block)
response = chat.ask(raw_block)
puts "Cache read tokens: #{response.tokens.cache_read}"
puts "Cache write tokens: #{response.tokens.cache_write}"

Extended Thinking

Give models more computation budget for complex reasoning (o1, o3, Claude Opus).

New in v1.10

# Enable with effort level
chat = RubyLLM.chat(model: 'claude-opus-4-5')
  .with_thinking(effort: :high)

response = chat.ask("Complex problem")

# Access thinking trace
puts response.thinking&.text
puts response.thinking&.signature
puts response.thinking_tokens

Effort Levels

chat.with_thinking(effort: :low)     # Fast, cheap
chat.with_thinking(effort: :medium)  # Balanced
chat.with_thinking(effort: :high)    # Slow, accurate
chat.with_thinking(effort: :none)    # Disable
chat.with_thinking(budget: 10_000)   # Token cap

Streaming with Thinking

chat.ask "Solve step by step" do |chunk|
  print chunk.thinking&.text  # Some providers stream thinking
  print chunk.content
end

Streaming

chat.ask "Write a story" do |chunk|
  print chunk.content
  $stdout.flush
end

# With lifecycle callbacks (v1.15+)
chat = RubyLLM.chat
  .before_message { print "Assistant > " }
  .after_message { |msg| puts "\n✓ Done (#{msg.tokens.output} tokens)" }
  .before_tool_call { |tc| puts "Calling: #{tc.name}" }
  .after_tool_result { |r| puts "Result: #{r}" }

chat.ask "Hello" do |chunk|
  print chunk.content
end

Deprecated (v1.15, removed in v2.0): on_new_message, on_end_message, on_tool_call, on_tool_result. Replace with before_message, after_message, before_tool_call, after_tool_result.

Concurrent Tool Execution

New in v1.16. When the LLM requests multiple tools in one turn, RubyLLM can run them in parallel.

# Global default — :threads, :fibers, true (= :threads), or false
RubyLLM.configure do |config|
  config.tool_concurrency = true
end

# Per-chat override
chat.with_tools(Weather, StockPrice, Currency, concurrency: :threads)
chat.with_tools(Weather, StockPrice, concurrency: :fibers)  # requires `async` gem
chat.with_tools(Weather, StockPrice, concurrency: false)    # sequential
  • :threads — true OS threads; safe for most tools
  • :fibers — single-threaded via the async gem; good for I/O-bound tools that use async-aware HTTP clients
  • All results are gathered before returning to the model (streaming results still accumulate in order)
  • In Rails, each tool call is wrapped in the Rails executor, ensuring proper connection pool checkout and CurrentAttributes propagation

Built-in Instrumentation

New in v1.16. No extra gem required. Events cover HTTP requests, chat completions, tool calls, embeddings, and model registry refreshes.

# Rails — use ActiveSupport::Notifications
ActiveSupport::Notifications.subscribe('chat.ruby_llm') do |_name, _start, _finish, _id, payload|
  Rails.logger.info(
    provider:      payload[:provider],
    model:         payload[:model],
    input_tokens:  payload[:input_tokens],
    output_tokens: payload[:output_tokens]
  )
end

# Non-Rails — supply any object that responds to instrument(name, payload, &block)
RubyLLM.configure do |config|
  config.instrumenter = MyInstrumenter.new
end

The external ruby_llm-instrumentation / opentelemetry-instrumentation-ruby_llm gems remain available for richer tracing pipelines.

Multi-Modal

# Images
chat.ask "What's in this image?", with: "photo.jpg"

# PDFs
chat.ask "Summarize this", with: "report.pdf"

# Audio
chat.ask "Transcribe", with: "meeting.mp3"

# Multiple files
chat.ask "Analyze", with: ["image.jpg", "doc.pdf", "notes.txt"]

Audio Transcription — Word-Level Timing (v1.16+)

transcription = RubyLLM.transcribe("interview.mp3", model: "whisper-1")
transcription.words  # => [{ word: "Hello", start: 0.0, end: 0.42 }, ...]

Token Tracking

response = chat.ask "Hello"

# v1.15+ granular token breakdown
puts "Input: #{response.tokens.input}"
puts "Output: #{response.tokens.output}"
puts "Cache read: #{response.tokens.cache_read}"
puts "Cache write: #{response.tokens.cache_write}"
puts "Thinking: #{response.thinking_tokens}"

# Built-in cost tracking (v1.15+) — returns nil for unknown pricing
puts "Response cost: #{response.cost.total}"
puts "Chat total: #{chat.cost.total}"

Compat helpers (response.input_tokens, response.output_tokens, response.cached_tokens) still work but the new response.tokens.* structure is preferred. response.cached_tokens no longer distinguishes read vs write; use response.tokens.cache_read / response.tokens.cache_write for accuracy.

Error Handling

begin
  response = chat.ask "Hello"
rescue RubyLLM::AuthenticationError
  # Invalid API key
rescue RubyLLM::RateLimitError => e
  sleep e.retry_after
  retry
rescue RubyLLM::TimeoutError
  # Request timeout
rescue RubyLLM::ContextLengthExceededError
  # Reduce prompt size
rescue RubyLLM::Error => e
  # Generic API error
end

Error Types

Error HTTP Description
BadRequestError 400 Invalid parameters
UnauthorizedError 401 Invalid API key
PaymentRequiredError 402 Billing issue
RateLimitError 429 Rate limit exceeded
ContextLengthExceededError - Token limit (v1.16: Anthropic context-length errors now raise this)
ServerError 500 Provider error
ServiceUnavailableError 502/503/504 Service down

Debugging

export RUBYLLM_DEBUG=true

Shows full request/response details in logs.

Best Practices

Tool Security

class SafeTool < RubyLLM::Tool
  param :input, desc: "User input"

  def execute(input:)
    raise ArgumentError if input.length > 1000
    # NEVER use: eval, system, exec, `
  end
end

Cost Control

simple_chat = RubyLLM.chat(model: 'gpt-5-nano')  # Cheap
complex_chat = RubyLLM.chat(model: 'claude-sonnet-4-6')  # Capable

Context Management

if chat.messages.sum { |m| m.input_tokens + m.output_tokens } > 100_000
  summary = summarize(chat.messages.first(40))
  chat.reset_messages!
  chat.add_message(role: :system, content: summary)
end

Resources

Install via CLI
npx skills add https://github.com/MadBomber/experiments --skill rubyllm
Repository Details
star Stars 12
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator