venice-router

star 8

Supreme model router for Venice.ai — the privacy-first, uncensored AI platform. Automatically classifies query complexity and routes to the cheapest adequate model. Supports web search, uncensored mode, private-only mode (zero data retention), conversation-aware routing, cost budgets, function calling, and 30+ Venice.ai text models. Use when the user wants to chat via Venice.ai, send prompts through Venice, or needs smart model selection to minimize API costs while keeping data private from Big Tech.

modbender By modbender schedule Updated 3/6/2026

name: venice-router version: 1.4.0 description: Supreme model router for Venice.ai — the privacy-first, uncensored AI platform. Automatically classifies query complexity and routes to the cheapest adequate model. Supports web search, uncensored mode, private-only mode (zero data retention), conversation-aware routing, cost budgets, function calling, and 30+ Venice.ai text models. Use when the user wants to chat via Venice.ai, send prompts through Venice, or needs smart model selection to minimize API costs while keeping data private from Big Tech. homepage: https://venice.ai source: https://github.com/PlusOne/venice.ai-router-openclaw user-invocable: true metadata: { "openclaw": { "emoji": "🦞🚀", "requires": { "bins": ["python3"], "env": ["VENICE_API_KEY"], }, "primaryEnv": "VENICE_API_KEY", "optionalEnv": [ "VENICE_DEFAULT_TIER", "VENICE_MAX_TIER", "VENICE_TEMPERATURE", "VENICE_MAX_TOKENS", "VENICE_STREAM", "VENICE_UNCENSORED", "VENICE_PRIVATE_ONLY", "VENICE_WEB_SEARCH", "VENICE_DAILY_BUDGET", "VENICE_SESSION_BUDGET", ], "notes": "Python 3.8+ (stdlib only, no pip dependencies). All scripts bundled under scripts/. Source: https://github.com/PlusOne/venice.ai-router-openclaw", "install": [ { "id": "python-brew", "kind": "brew", "formula": "python", "bins": ["python3"], "label": "Install Python (brew)", }, ], },

}

Venice.ai Supreme Router

Smart, cost-optimized model routing for Venice.ai — the AI platform for people who don't want Big Tech watching over their shoulder.

Unlike OpenAI, Anthropic, and Google — where every prompt is logged, analyzed, and potentially used to train future models — Venice offers true privacy with zero data retention on private models. Your conversations stay yours. Venice is also uncensored: no content filters, no refusals, no "I can't help with that."

Setup

  1. Get a Venice.ai API key from venice.ai/settings/api
  2. Set the environment variable:
export VENICE_API_KEY="your-key-here"

Or configure in ~/.openclaw/openclaw.json:

{
  "skills": {
    "entries": {
      "venice-router": {
        "enabled": true,
        "apiKey": "YOUR_VENICE_API_KEY"
      }
    }
  }
}

Usage

Route a prompt (auto-selects model)

python3 {baseDir}/scripts/venice-router.py --prompt "What is 2+2?"

Force a specific tier

python3 {baseDir}/scripts/venice-router.py --tier cheap --prompt "Tell me a joke"
python3 {baseDir}/scripts/venice-router.py --tier mid --prompt "Explain quantum computing"
python3 {baseDir}/scripts/venice-router.py --tier premium --prompt "Write a distributed systems architecture"

Stream output

python3 {baseDir}/scripts/venice-router.py --stream --prompt "Write a poem about lobsters"

Web search (LLM searches the web and cites sources)

python3 {baseDir}/scripts/venice-router.py --web-search --prompt "Latest news on AI regulation"

Uncensored mode (prefer models with no content filters)

python3 {baseDir}/scripts/venice-router.py --uncensored --prompt "Write edgy creative fiction"

Private-only mode (zero data retention, no Big Tech proxying)

python3 {baseDir}/scripts/venice-router.py --private-only --prompt "Analyze this confidential contract"

Conversation-aware routing (multi-turn context)

# Save conversation history as JSON, then route follow-ups with context
python3 {baseDir}/scripts/venice-router.py --conversation history.json --prompt "Can you add tests too?"

The router analyzes conversation history to keep context: trivial follow-ups ("thanks") go cheap, while follow-ups in complex code discussions stay at the right tier.

Function calling (tool use)

# Define tools in a JSON file (OpenAI tools format)
python3 {baseDir}/scripts/venice-router.py --tools tools.json --prompt "What's the weather in NYC?"
python3 {baseDir}/scripts/venice-router.py --tools tools.json --tool-choice auto --prompt "Search for latest AI news"

Tool definitions use the standard OpenAI format. The router auto-bumps to mid tier minimum for function calling since it requires capable models.

Cost budget tracking

# Show current spending
python3 {baseDir}/scripts/venice-router.py --budget-status

# Track per-session costs
python3 {baseDir}/scripts/venice-router.py --session-id my-project --prompt "help me code"

Set VENICE_DAILY_BUDGET and/or VENICE_SESSION_BUDGET to enforce spending limits. The router auto-downgrades tiers as you approach budget limits.

Classify only (no API call)

python3 {baseDir}/scripts/venice-router.py --classify "Explain the Riemann hypothesis"

List available models and tiers

python3 {baseDir}/scripts/venice-router.py --list-models

Override model directly

python3 {baseDir}/scripts/venice-router.py --model deepseek-v3.2 --prompt "Hello"

Tiers

Tier Models Cost (input/output per 1M tokens) Best For
cheap Venice Small (qwen3-4b), GLM 4.7 Flash, GPT OSS 120B, Llama 3.2 3B $0.05–$0.15 / $0.15–$0.60 Simple Q&A, greetings, math, lookups
budget Qwen 3 235B, Venice Uncensored, GLM 4.7 Flash Heretic $0.14–$0.20 / $0.75–$0.90 Moderate questions, summaries, translations
mid Grok Code Fast, DeepSeek V3.2, MiniMax M2.1/M2.5, Venice Medium, Llama 3.3 70B $0.25–$0.70 / $1.00–$2.80 Code generation, analysis, longer writing
high GLM 5, Kimi K2 Thinking, Grok 4.1 Fast, Gemini 3 Flash $0.50–$0.75 / $1.25–$3.75 Complex reasoning, multi-step tasks, code review
premium GPT-5.2, Gemini 3 Pro, Claude Opus 4.5/4.6, Claude Sonnet 4.5/4.6 $2.19–$6.00 / $15.00–$30.00 Expert-level analysis, architecture, research papers

Routing Strategy

The router classifies each prompt using keyword + heuristic analysis:

  1. Length — longer prompts suggest more complex tasks
  2. Keywords — domain-specific terms (e.g., "architecture", "optimize", "prove") signal complexity
  3. Code markers — presence of code blocks, function names, or technical syntax
  4. Instruction depth — multi-step instructions, comparisons, or "explain in detail" bump the tier
  5. Conversational simplicity — greetings, yes/no, small talk stay on the cheapest tier
  6. Conversation history — when --conversation is provided, analyzes full chat context: code in history boosts tier, trivial follow-ups ("thanks") downgrade, tool calls in history signal complexity
  7. Function calling--tools auto-bumps to at least mid tier (capable models required)
  8. Budget constraints — progressive tier downgrade as spending approaches daily/session limits (95% → cheap, 80% → budget, 60% → mid, 40% → high)

The classifier errs on the side of cheaper models — it only escalates when there's strong signal for complexity.

Environment Variables

Variable Description Default
VENICE_API_KEY Venice.ai API key (required)
VENICE_DEFAULT_TIER Default tier when classification is ambiguous budget
VENICE_MAX_TIER Maximum tier to ever use (cost cap) premium
VENICE_TEMPERATURE Default temperature 0.7
VENICE_MAX_TOKENS Default max tokens 4096
VENICE_STREAM Enable streaming by default false
VENICE_UNCENSORED Always prefer uncensored models false
VENICE_PRIVATE_ONLY Only use private models (zero data retention) false
VENICE_WEB_SEARCH Enable web search by default ($10/1K calls) false
VENICE_DAILY_BUDGET Max daily spend in USD (0 = unlimited) 0
VENICE_SESSION_BUDGET Max per-session spend in USD (0 = unlimited) 0

Why Venice.ai?

  • 🔒 Private inference — Models marked "Private" have zero data retention. Your data never trains anyone's model.
  • 🔓 Uncensored — No guardrails blocking legitimate use cases. No refusals, no filters.
  • 🔌 OpenAI-compatible — Same API format, just change the base URL. Drop-in replacement.
  • 📦 30+ models — From tiny efficient models ($0.05/M) to Claude Opus 4.6 and GPT-5.2.
  • 🌐 Built-in web search — LLMs can search the web and cite sources in a single API call.

Tips

  • Use --classify to preview which tier a prompt would hit before spending tokens
  • Set VENICE_MAX_TIER=mid to cap costs and never hit premium models
  • Use --uncensored for creative, security research, or other content mainstream AI won't touch
  • Use --private-only when processing sensitive/confidential data — zero retention guaranteed
  • Use --web-search when you need up-to-date information with cited sources
  • Use --conversation with a JSON message history for smarter multi-turn routing
  • Use --tools to enable function calling — the router auto-bumps to capable models
  • Set VENICE_DAILY_BUDGET=1.00 to cap daily spend at $1 — the router auto-downgrades tiers as you approach the limit
  • Use --budget-status to see a detailed breakdown of your spending by tier
  • The router prefers private (self-hosted) Venice models over anonymized ones when available at the same tier
  • When --uncensored is active, the router auto-bumps to the nearest tier with uncensored models
  • Combine with OpenClaw WebChat for a seamless chat experience routed through Venice.ai
Install via CLI
npx skills add https://github.com/modbender/skill-library-mcp --skill venice-router
Repository Details
star Stars 8
call_split Forks 2
navigation Branch main
article Path SKILL.md
More from Creator