rubyllm - SKILL.md Agent Skill

name: rubyllm version: 1.16.0 description: | One beautiful Ruby API for GPT, Claude, Gemini, and more. Use this skill when building AI-powered applications with RubyLLM - chatbots, AI agents, RAG applications, content generators, vision/audio analysis, embeddings, image generation, and Rails integration. Supports 15+ providers with a unified interface. v1.16 adds concurrent tool execution (threads or fibers), built-in instrumentation, and per-provider API base URL overrides. allowed-tools: - Bash(bundle ) - Bash(bin/rails )

RubyLLM v1.16.0

One beautiful Ruby API for GPT, Claude, Gemini, and more.

RubyLLM provides a unified interface for working with AI models across 15+ providers. Build chatbots, AI agents, RAG applications, and content generators with the same simple API.

Gem Version: 1.15.0

Installation

Ruby Project

bundle add ruby_llm

# config/initializers/ruby_llm.rb
RubyLLM.configure do |config|
  config.openai_api_key = ENV['OPENAI_API_KEY']
  config.anthropic_api_key = ENV['ANTHROPIC_API_KEY']
  config.gemini_api_key = ENV['GEMINI_API_KEY']
end

Rails Integration

bundle add ruby_llm
bin/rails generate ruby_llm:install
bin/rails db:migrate
bin/rails ruby_llm:load_models

# Optional: Add chat UI
bin/rails generate ruby_llm:chat_ui
# Visit http://localhost:3000/chats

Quick Start

# Basic chat
chat = RubyLLM.chat
response = chat.ask "What is Ruby on Rails?"

# Stream responses
chat.ask "Write a story" do |chunk|
  print chunk.content
end

# With tools
class Weather < RubyLLM::Tool
  description "Get weather"
  param :city, desc: "City name"
  def execute(city:)
    "Sunny in #{city}"
  end
end

chat.with_tool(Weather).ask "Weather in Paris?"

Core Features

Feature	Skill Reference
Chat	This skill (below)
Tools	tools
Agents	agents
Streaming	This skill (Streaming section)
Embeddings	embeddings
Image Generation	image-generation
Image Editing	image-generation (v1.15+)
Audio Transcription	audio-transcription
Moderation	moderation
Extended Thinking	This skill (Extended Thinking section)
Rails Integration	rails

Ecosystem Gems

Gem	Version	Skill Reference
ruby_llm-schema	0.3.0	schema
ruby_llm-mcp	1.0.0	mcp
ruby_llm-instrumentation	0.3.1	instrumentation
ruby_llm-monitoring	0.3.2	monitoring
ruby_llm-red_candle	0.2.0	red_candle
ruby_llm-tribunal	0.1.1	tribunal
opentelemetry-instrumentation-ruby_llm	0.4.0	opentelemetry

Chat API

# Create chat
chat = RubyLLM.chat(model: 'gpt-5.4')

# Ask question
response = chat.ask "What is Ruby?"

# Continue conversation (remembers context)
chat.ask "Show me an example"

# Access messages
chat.messages.each do |msg|
  puts "[#{msg.role}] #{msg.content}"
end

# System instructions
chat.with_instructions "You are a Ruby expert. Be concise."

Providers

RubyLLM supports 15+ providers through a unified API:

Provider	Models	Vision	Tools	Audio
OpenAI	GPT-4, GPT-4o, o1, o3	✅	✅	✅
Anthropic	Claude 3/4	✅	✅	❌
Google	Gemini 1.5/2.0/2.5/3.0	✅	✅	✅
xAI	Grok-1/2/3	✅	✅	❌
AWS Bedrock	Claude, Llama, Titan	✅	✅	❌
Ollama	Local models	✅	✅	✅
OpenRouter	300+ models	✅	✅	❌
Perplexity	Search models	❌	✅	❌
Mistral	Mistral/Mixtral	✅	✅	❌
DeepSeek	DeepSeek-V3	❌	✅	❌
VertexAI	Google Cloud	✅	✅	✅
GPUStack	Self-hosted	✅	✅	❌
Azure OpenAI	Enterprise OpenAI	✅	✅	✅

Provider Setup

RubyLLM.configure do |config|
  # API keys
  config.openai_api_key = ENV['OPENAI_API_KEY']
  config.anthropic_api_key = ENV['ANTHROPIC_API_KEY']
  config.gemini_api_key = ENV['GEMINI_API_KEY']
  config.xai_api_key = ENV['XAI_API_KEY']
  config.perplexity_api_key = ENV['PERPLEXITY_API_KEY']
  config.mistral_api_key = ENV['MISTRAL_API_KEY']
  config.deepseek_api_key = ENV['DEEPSEEK_API_KEY']

  # Per-provider API base URL overrides (v1.16+)
  config.bedrock_api_base    = ENV['BEDROCK_API_BASE']
  config.mistral_api_base    = ENV['MISTRAL_API_BASE']
  config.perplexity_api_base = ENV['PERPLEXITY_API_BASE']
  config.vertexai_api_base   = ENV['VERTEXAI_API_BASE']
  config.xai_api_base        = ENV['XAI_API_BASE']
  # OpenAI, Anthropic, Gemini, DeepSeek, OpenRouter, Azure, Ollama, GPUStack also supported

  # Concurrent tool execution (v1.16+): true/:threads, :fibers, or false
  config.tool_concurrency = true

  # HTTP adapter (v1.16+): :async_http, :typhoeus, :net_http, :httpx, etc.
  config.faraday_adapter = :async_http

  # Deprecation warnings (v1.16+): :warn, :silence, or :raise
  config.deprecation_behavior = :warn

  # Built-in instrumentation (v1.16+, non-Rails)
  # config.instrumenter = MyInstrumenter.new  # must respond to instrument(name, payload) { }
end

Model Selection

# By model ID
chat = RubyLLM.chat(model: 'claude-sonnet-4-6')

# By provider routing
chat = RubyLLM.chat(model: 'claude-sonnet-4-6', provider: 'bedrock')

# Model registry
RubyLLM.models.supporting(:vision)
RubyLLM.models.find('gpt-5.4')

Prompt Caching (Anthropic)

raw_block = RubyLLM::Content::Raw.new([
  { 
    type: 'text', 
    text: File.read('large_document.txt'),
    cache_control: { type: 'ephemeral' }
  }
])

chat.add_message(role: :system, content: raw_block)
response = chat.ask(raw_block)
puts "Cache read tokens: #{response.tokens.cache_read}"
puts "Cache write tokens: #{response.tokens.cache_write}"

Extended Thinking

Give models more computation budget for complex reasoning (o1, o3, Claude Opus).

New in v1.10

# Enable with effort level
chat = RubyLLM.chat(model: 'claude-opus-4-5')
  .with_thinking(effort: :high)

response = chat.ask("Complex problem")

# Access thinking trace
puts response.thinking&.text
puts response.thinking&.signature
puts response.thinking_tokens

Effort Levels

chat.with_thinking(effort: :low)     # Fast, cheap
chat.with_thinking(effort: :medium)  # Balanced
chat.with_thinking(effort: :high)    # Slow, accurate
chat.with_thinking(effort: :none)    # Disable
chat.with_thinking(budget: 10_000)   # Token cap

Streaming with Thinking

chat.ask "Solve step by step" do |chunk|
  print chunk.thinking&.text  # Some providers stream thinking
  print chunk.content
end

Streaming

chat.ask "Write a story" do |chunk|
  print chunk.content
  $stdout.flush
end

# With lifecycle callbacks (v1.15+)
chat = RubyLLM.chat
  .before_message { print "Assistant > " }
  .after_message { |msg| puts "\n✓ Done (#{msg.tokens.output} tokens)" }
  .before_tool_call { |tc| puts "Calling: #{tc.name}" }
  .after_tool_result { |r| puts "Result: #{r}" }

chat.ask "Hello" do |chunk|
  print chunk.content
end

Deprecated (v1.15, removed in v2.0): on_new_message, on_end_message, on_tool_call, on_tool_result. Replace with before_message, after_message, before_tool_call, after_tool_result.

Concurrent Tool Execution

New in v1.16. When the LLM requests multiple tools in one turn, RubyLLM can run them in parallel.

# Global default — :threads, :fibers, true (= :threads), or false
RubyLLM.configure do |config|
  config.tool_concurrency = true
end

# Per-chat override
chat.with_tools(Weather, StockPrice, Currency, concurrency: :threads)
chat.with_tools(Weather, StockPrice, concurrency: :fibers)  # requires `async` gem
chat.with_tools(Weather, StockPrice, concurrency: false)    # sequential

:threads — true OS threads; safe for most tools
:fibers — single-threaded via the async gem; good for I/O-bound tools that use async-aware HTTP clients
All results are gathered before returning to the model (streaming results still accumulate in order)
In Rails, each tool call is wrapped in the Rails executor, ensuring proper connection pool checkout and CurrentAttributes propagation

Built-in Instrumentation

New in v1.16. No extra gem required. Events cover HTTP requests, chat completions, tool calls, embeddings, and model registry refreshes.

# Rails — use ActiveSupport::Notifications
ActiveSupport::Notifications.subscribe('chat.ruby_llm') do |_name, _start, _finish, _id, payload|
  Rails.logger.info(
    provider:      payload[:provider],
    model:         payload[:model],
    input_tokens:  payload[:input_tokens],
    output_tokens: payload[:output_tokens]
  )
end

# Non-Rails — supply any object that responds to instrument(name, payload, &block)
RubyLLM.configure do |config|
  config.instrumenter = MyInstrumenter.new
end

The external ruby_llm-instrumentation / opentelemetry-instrumentation-ruby_llm gems remain available for richer tracing pipelines.

Multi-Modal

# Images
chat.ask "What's in this image?", with: "photo.jpg"

# PDFs
chat.ask "Summarize this", with: "report.pdf"

# Audio
chat.ask "Transcribe", with: "meeting.mp3"

# Multiple files
chat.ask "Analyze", with: ["image.jpg", "doc.pdf", "notes.txt"]

Audio Transcription — Word-Level Timing (v1.16+)

transcription = RubyLLM.transcribe("interview.mp3", model: "whisper-1")
transcription.words  # => [{ word: "Hello", start: 0.0, end: 0.42 }, ...]

Token Tracking

response = chat.ask "Hello"

# v1.15+ granular token breakdown
puts "Input: #{response.tokens.input}"
puts "Output: #{response.tokens.output}"
puts "Cache read: #{response.tokens.cache_read}"
puts "Cache write: #{response.tokens.cache_write}"
puts "Thinking: #{response.thinking_tokens}"

# Built-in cost tracking (v1.15+) — returns nil for unknown pricing
puts "Response cost: #{response.cost.total}"
puts "Chat total: #{chat.cost.total}"

Compat helpers (response.input_tokens, response.output_tokens, response.cached_tokens) still work but the new response.tokens.* structure is preferred. response.cached_tokens no longer distinguishes read vs write; use response.tokens.cache_read / response.tokens.cache_write for accuracy.

Error Handling

begin
  response = chat.ask "Hello"
rescue RubyLLM::AuthenticationError
  # Invalid API key
rescue RubyLLM::RateLimitError => e
  sleep e.retry_after
  retry
rescue RubyLLM::TimeoutError
  # Request timeout
rescue RubyLLM::ContextLengthExceededError
  # Reduce prompt size
rescue RubyLLM::Error => e
  # Generic API error
end

Error Types

Error	HTTP	Description
`BadRequestError`	400	Invalid parameters
`UnauthorizedError`	401	Invalid API key
`PaymentRequiredError`	402	Billing issue
`RateLimitError`	429	Rate limit exceeded
`ContextLengthExceededError`	-	Token limit (v1.16: Anthropic context-length errors now raise this)
`ServerError`	500	Provider error
`ServiceUnavailableError`	502/503/504	Service down

Debugging

export RUBYLLM_DEBUG=true

Shows full request/response details in logs.

Best Practices

Tool Security

class SafeTool < RubyLLM::Tool
  param :input, desc: "User input"

  def execute(input:)
    raise ArgumentError if input.length > 1000
    # NEVER use: eval, system, exec, `
  end
end

Cost Control

simple_chat = RubyLLM.chat(model: 'gpt-5-nano')  # Cheap
complex_chat = RubyLLM.chat(model: 'claude-sonnet-4-6')  # Capable

Context Management

if chat.messages.sum { |m| m.input_tokens + m.output_tokens } > 100_000
  summary = summarize(chat.messages.first(40))
  chat.reset_messages!
  chat.add_message(role: :system, content: summary)
end

Resources

Official Docs: https://rubyllm.com
GitHub: https://github.com/crmne/ruby_llm