name: rubyllm version: 1.16.0 description: | One beautiful Ruby API for GPT, Claude, Gemini, and more. Use this skill when building AI-powered applications with RubyLLM - chatbots, AI agents, RAG applications, content generators, vision/audio analysis, embeddings, image generation, and Rails integration. Supports 15+ providers with a unified interface. v1.16 adds concurrent tool execution (threads or fibers), built-in instrumentation, and per-provider API base URL overrides. allowed-tools: - Bash(bundle *) - Bash(bin/rails *)
RubyLLM v1.16.0
One beautiful Ruby API for GPT, Claude, Gemini, and more.
RubyLLM provides a unified interface for working with AI models across 15+ providers. Build chatbots, AI agents, RAG applications, and content generators with the same simple API.
Gem Version: 1.15.0
Installation
Ruby Project
bundle add ruby_llm
# config/initializers/ruby_llm.rb
RubyLLM.configure do |config|
config.openai_api_key = ENV['OPENAI_API_KEY']
config.anthropic_api_key = ENV['ANTHROPIC_API_KEY']
config.gemini_api_key = ENV['GEMINI_API_KEY']
end
Rails Integration
bundle add ruby_llm
bin/rails generate ruby_llm:install
bin/rails db:migrate
bin/rails ruby_llm:load_models
# Optional: Add chat UI
bin/rails generate ruby_llm:chat_ui
# Visit http://localhost:3000/chats
Quick Start
# Basic chat
chat = RubyLLM.chat
response = chat.ask "What is Ruby on Rails?"
# Stream responses
chat.ask "Write a story" do |chunk|
print chunk.content
end
# With tools
class Weather < RubyLLM::Tool
description "Get weather"
param :city, desc: "City name"
def execute(city:)
"Sunny in #{city}"
end
end
chat.with_tool(Weather).ask "Weather in Paris?"
Core Features
| Feature | Skill Reference |
|---|---|
| Chat | This skill (below) |
| Tools | tools |
| Agents | agents |
| Streaming | This skill (Streaming section) |
| Embeddings | embeddings |
| Image Generation | image-generation |
| Image Editing | image-generation (v1.15+) |
| Audio Transcription | audio-transcription |
| Moderation | moderation |
| Extended Thinking | This skill (Extended Thinking section) |
| Rails Integration | rails |
Ecosystem Gems
| Gem | Version | Skill Reference |
|---|---|---|
| ruby_llm-schema | 0.3.0 | schema |
| ruby_llm-mcp | 1.0.0 | mcp |
| ruby_llm-instrumentation | 0.3.1 | instrumentation |
| ruby_llm-monitoring | 0.3.2 | monitoring |
| ruby_llm-red_candle | 0.2.0 | red_candle |
| ruby_llm-tribunal | 0.1.1 | tribunal |
| opentelemetry-instrumentation-ruby_llm | 0.4.0 | opentelemetry |
Chat API
# Create chat
chat = RubyLLM.chat(model: 'gpt-5.4')
# Ask question
response = chat.ask "What is Ruby?"
# Continue conversation (remembers context)
chat.ask "Show me an example"
# Access messages
chat.messages.each do |msg|
puts "[#{msg.role}] #{msg.content}"
end
# System instructions
chat.with_instructions "You are a Ruby expert. Be concise."
Providers
RubyLLM supports 15+ providers through a unified API:
| Provider | Models | Vision | Tools | Audio |
|---|---|---|---|---|
| OpenAI | GPT-4, GPT-4o, o1, o3 | ✅ | ✅ | ✅ |
| Anthropic | Claude 3/4 | ✅ | ✅ | ❌ |
| Gemini 1.5/2.0/2.5/3.0 | ✅ | ✅ | ✅ | |
| xAI | Grok-1/2/3 | ✅ | ✅ | ❌ |
| AWS Bedrock | Claude, Llama, Titan | ✅ | ✅ | ❌ |
| Ollama | Local models | ✅ | ✅ | ✅ |
| OpenRouter | 300+ models | ✅ | ✅ | ❌ |
| Perplexity | Search models | ❌ | ✅ | ❌ |
| Mistral | Mistral/Mixtral | ✅ | ✅ | ❌ |
| DeepSeek | DeepSeek-V3 | ❌ | ✅ | ❌ |
| VertexAI | Google Cloud | ✅ | ✅ | ✅ |
| GPUStack | Self-hosted | ✅ | ✅ | ❌ |
| Azure OpenAI | Enterprise OpenAI | ✅ | ✅ | ✅ |
Provider Setup
RubyLLM.configure do |config|
# API keys
config.openai_api_key = ENV['OPENAI_API_KEY']
config.anthropic_api_key = ENV['ANTHROPIC_API_KEY']
config.gemini_api_key = ENV['GEMINI_API_KEY']
config.xai_api_key = ENV['XAI_API_KEY']
config.perplexity_api_key = ENV['PERPLEXITY_API_KEY']
config.mistral_api_key = ENV['MISTRAL_API_KEY']
config.deepseek_api_key = ENV['DEEPSEEK_API_KEY']
# Per-provider API base URL overrides (v1.16+)
config.bedrock_api_base = ENV['BEDROCK_API_BASE']
config.mistral_api_base = ENV['MISTRAL_API_BASE']
config.perplexity_api_base = ENV['PERPLEXITY_API_BASE']
config.vertexai_api_base = ENV['VERTEXAI_API_BASE']
config.xai_api_base = ENV['XAI_API_BASE']
# OpenAI, Anthropic, Gemini, DeepSeek, OpenRouter, Azure, Ollama, GPUStack also supported
# Concurrent tool execution (v1.16+): true/:threads, :fibers, or false
config.tool_concurrency = true
# HTTP adapter (v1.16+): :async_http, :typhoeus, :net_http, :httpx, etc.
config.faraday_adapter = :async_http
# Deprecation warnings (v1.16+): :warn, :silence, or :raise
config.deprecation_behavior = :warn
# Built-in instrumentation (v1.16+, non-Rails)
# config.instrumenter = MyInstrumenter.new # must respond to instrument(name, payload) { }
end
Model Selection
# By model ID
chat = RubyLLM.chat(model: 'claude-sonnet-4-6')
# By provider routing
chat = RubyLLM.chat(model: 'claude-sonnet-4-6', provider: 'bedrock')
# Model registry
RubyLLM.models.supporting(:vision)
RubyLLM.models.find('gpt-5.4')
Prompt Caching (Anthropic)
raw_block = RubyLLM::Content::Raw.new([
{
type: 'text',
text: File.read('large_document.txt'),
cache_control: { type: 'ephemeral' }
}
])
chat.add_message(role: :system, content: raw_block)
response = chat.ask(raw_block)
puts "Cache read tokens: #{response.tokens.cache_read}"
puts "Cache write tokens: #{response.tokens.cache_write}"
Extended Thinking
Give models more computation budget for complex reasoning (o1, o3, Claude Opus).
New in v1.10
# Enable with effort level
chat = RubyLLM.chat(model: 'claude-opus-4-5')
.with_thinking(effort: :high)
response = chat.ask("Complex problem")
# Access thinking trace
puts response.thinking&.text
puts response.thinking&.signature
puts response.thinking_tokens
Effort Levels
chat.with_thinking(effort: :low) # Fast, cheap
chat.with_thinking(effort: :medium) # Balanced
chat.with_thinking(effort: :high) # Slow, accurate
chat.with_thinking(effort: :none) # Disable
chat.with_thinking(budget: 10_000) # Token cap
Streaming with Thinking
chat.ask "Solve step by step" do |chunk|
print chunk.thinking&.text # Some providers stream thinking
print chunk.content
end
Streaming
chat.ask "Write a story" do |chunk|
print chunk.content
$stdout.flush
end
# With lifecycle callbacks (v1.15+)
chat = RubyLLM.chat
.before_message { print "Assistant > " }
.after_message { |msg| puts "\n✓ Done (#{msg.tokens.output} tokens)" }
.before_tool_call { |tc| puts "Calling: #{tc.name}" }
.after_tool_result { |r| puts "Result: #{r}" }
chat.ask "Hello" do |chunk|
print chunk.content
end
Deprecated (v1.15, removed in v2.0):
on_new_message,on_end_message,on_tool_call,on_tool_result. Replace withbefore_message,after_message,before_tool_call,after_tool_result.
Concurrent Tool Execution
New in v1.16. When the LLM requests multiple tools in one turn, RubyLLM can run them in parallel.
# Global default — :threads, :fibers, true (= :threads), or false
RubyLLM.configure do |config|
config.tool_concurrency = true
end
# Per-chat override
chat.with_tools(Weather, StockPrice, Currency, concurrency: :threads)
chat.with_tools(Weather, StockPrice, concurrency: :fibers) # requires `async` gem
chat.with_tools(Weather, StockPrice, concurrency: false) # sequential
:threads— true OS threads; safe for most tools:fibers— single-threaded via theasyncgem; good for I/O-bound tools that useasync-aware HTTP clients- All results are gathered before returning to the model (streaming results still accumulate in order)
- In Rails, each tool call is wrapped in the Rails executor, ensuring proper connection pool checkout and
CurrentAttributespropagation
Built-in Instrumentation
New in v1.16. No extra gem required. Events cover HTTP requests, chat completions, tool calls, embeddings, and model registry refreshes.
# Rails — use ActiveSupport::Notifications
ActiveSupport::Notifications.subscribe('chat.ruby_llm') do |_name, _start, _finish, _id, payload|
Rails.logger.info(
provider: payload[:provider],
model: payload[:model],
input_tokens: payload[:input_tokens],
output_tokens: payload[:output_tokens]
)
end
# Non-Rails — supply any object that responds to instrument(name, payload, &block)
RubyLLM.configure do |config|
config.instrumenter = MyInstrumenter.new
end
The external
ruby_llm-instrumentation/opentelemetry-instrumentation-ruby_llmgems remain available for richer tracing pipelines.
Multi-Modal
# Images
chat.ask "What's in this image?", with: "photo.jpg"
# PDFs
chat.ask "Summarize this", with: "report.pdf"
# Audio
chat.ask "Transcribe", with: "meeting.mp3"
# Multiple files
chat.ask "Analyze", with: ["image.jpg", "doc.pdf", "notes.txt"]
Audio Transcription — Word-Level Timing (v1.16+)
transcription = RubyLLM.transcribe("interview.mp3", model: "whisper-1")
transcription.words # => [{ word: "Hello", start: 0.0, end: 0.42 }, ...]
Token Tracking
response = chat.ask "Hello"
# v1.15+ granular token breakdown
puts "Input: #{response.tokens.input}"
puts "Output: #{response.tokens.output}"
puts "Cache read: #{response.tokens.cache_read}"
puts "Cache write: #{response.tokens.cache_write}"
puts "Thinking: #{response.thinking_tokens}"
# Built-in cost tracking (v1.15+) — returns nil for unknown pricing
puts "Response cost: #{response.cost.total}"
puts "Chat total: #{chat.cost.total}"
Compat helpers (
response.input_tokens,response.output_tokens,response.cached_tokens) still work but the newresponse.tokens.*structure is preferred.response.cached_tokensno longer distinguishes read vs write; useresponse.tokens.cache_read/response.tokens.cache_writefor accuracy.
Error Handling
begin
response = chat.ask "Hello"
rescue RubyLLM::AuthenticationError
# Invalid API key
rescue RubyLLM::RateLimitError => e
sleep e.retry_after
retry
rescue RubyLLM::TimeoutError
# Request timeout
rescue RubyLLM::ContextLengthExceededError
# Reduce prompt size
rescue RubyLLM::Error => e
# Generic API error
end
Error Types
| Error | HTTP | Description |
|---|---|---|
BadRequestError |
400 | Invalid parameters |
UnauthorizedError |
401 | Invalid API key |
PaymentRequiredError |
402 | Billing issue |
RateLimitError |
429 | Rate limit exceeded |
ContextLengthExceededError |
- | Token limit (v1.16: Anthropic context-length errors now raise this) |
ServerError |
500 | Provider error |
ServiceUnavailableError |
502/503/504 | Service down |
Debugging
export RUBYLLM_DEBUG=true
Shows full request/response details in logs.
Best Practices
Tool Security
class SafeTool < RubyLLM::Tool
param :input, desc: "User input"
def execute(input:)
raise ArgumentError if input.length > 1000
# NEVER use: eval, system, exec, `
end
end
Cost Control
simple_chat = RubyLLM.chat(model: 'gpt-5-nano') # Cheap
complex_chat = RubyLLM.chat(model: 'claude-sonnet-4-6') # Capable
Context Management
if chat.messages.sum { |m| m.input_tokens + m.output_tokens } > 100_000
summary = summarize(chat.messages.first(40))
chat.reset_messages!
chat.add_message(role: :system, content: summary)
end
Resources
- Official Docs: https://rubyllm.com
- GitHub: https://github.com/crmne/ruby_llm