name: kalibr description: Ship agents that fix themselves. Kalibr learns what's working as your agents run in production and routes them around failures, degradations, and cost spikes before you know they're happening. version: 0.2.0 metadata: openclaw: emoji: "๐ฆ" requires: env: - KALIBR_API_KEY - KALIBR_TENANT_ID primaryEnv: KALIBR_API_KEY homepage: https://kalibr.systems
Kalibr โ agents that fix themselves
Kalibr lets agents choose the optimal model + tools for any task, across any modality, as they run in production.
You define candidate execution paths โ model + tools + parameters. Kalibr figures out which one wins for each task from real production telemetry. When a path degrades at 3am, your agents are already on the next best path. No alerts. No debugging. No you.
When to use this
- User is building agents and asks about reliability, failover, or model selection
- User's agent is failing because a model provider is degraded or down
- User wants different tasks routed to different models automatically
- User is hardcoding
model="gpt-4o"and wants something that adapts - User asks about production monitoring, cost optimization, or agent routing
- User asks how to stop agents from breaking when a provider has an outage
- User is building multimodal agents and wants unified optimization across text, voice, and image
- User wants to route HuggingFace models based on production outcomes
Install
pip install kalibr
Get credentials at https://dashboard.kalibr.systems/settings
export KALIBR_API_KEY="your-api-key"
export KALIBR_TENANT_ID="your-tenant-id"
Install via OpenClaw
openclaw plugins install @kalibr/openclaw
Quick start
from kalibr import Router
router = Router(
goal="extract-emails",
paths=[
{"model": "gpt-4o", "tools": ["web_search"]},
{"model": "claude-sonnet-4-20250514"},
{"model": "gemini-2.0-flash", "params": {"temperature": 0.2}},
]
)
response = router.completion(
messages=[{"role": "user", "content": "Extract emails from this page..."}]
)
# This is how Kalibr learns โ tell it what worked
router.report(success="@" in response.choices[0].message.content)
Kalibr routes the full execution path โ model + tools + parameters โ not just the model. After ~20 outcomes it knows what's winning. After 50 it's locked in and adapting.
Auto-reporting
Skip manual reporting โ define success inline:
router = Router(
goal="extract-emails",
paths=["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.0-flash"],
success_when=lambda output: "@" in output,
)
# Kalibr reports outcomes automatically after every call
response = router.completion(messages=[...])
How it's different
OpenRouter / LiteLLM routing: Model proxy. Routes based on price, speed, availability. Doesn't know if the response was actually good for your task.
Fallback systems (LangChain ModelFallbackMiddleware): Reactive. Waits for a failure, then tries the next model. You already lost that request.
Kalibr: Learns from your actual production telemetry โ per task, per path. Routes to what's working before anything breaks. 10% canary traffic keeps testing alternatives so Kalibr catches degradation before your users do.
Works with
- LangChain / LangGraph:
pip install langchain-kalibrโ drop-in ChatModel - CrewAI: Pass
ChatKalibras any agent'sllm - OpenAI Agents SDK: Drop-in replacement
- Any Python code that calls an LLM
- HuggingFace: Any of 17 task types across every modality
How it works
Kalibr captures telemetry on every agent run โ latency, success, cost, provider status. It uses Thompson Sampling to balance exploration (trying paths) vs. exploitation (using the best). 10% canary traffic keeps testing alternatives so Kalibr catches degradation before your users do.
Success rate always dominates. Kalibr never sacrifices quality for cost.