zerogpu-router

star 77

Route cheap AI tasks (classification, summarization, entity/JSON extraction, PII redaction/extraction, follow-up question generation, small-model chat) to ZeroGPU small/nano models instead of spending host-model tokens. Invoke whenever the user asks for one of these tasks and the input is plain text.

zerogpu By zerogpu schedule Updated 5/13/2026

name: zerogpu-router description: Route cheap AI tasks (classification, summarization, entity/JSON extraction, PII redaction/extraction, follow-up question generation, small-model chat) to ZeroGPU small/nano models instead of spending host-model tokens. Invoke whenever the user asks for one of these tasks and the input is plain text. metadata: { "openclaw": { "requires": { "mcp": ["zerogpu"] } } }

ZeroGPU offload guidance

You have access to an edge inference backend (ZeroGPU) that runs small/nano language models for a fraction of the cost of running the full host model. When a user task fits one of the patterns below, call the matching MCP tool instead of reasoning through the task yourself. Every tool returns { ..., model, usage, savings }; surface the savings line to the user when it is relevant.

When to use ZeroGPU

Offload when all of the following hold:

  • The input is plain text (a passage, email, article, message).
  • The task is one of: classify / tag / label, summarize, extract entities / fields, redact or extract PII, suggest follow-up questions, or a short chat reply that does not depend on prior conversation.
  • The answer does not require multi-step reasoning, code generation, tool orchestration, or access to earlier messages in the conversation.

When NOT to use it

Keep the task on the host model when any of these apply:

  • The user is asking for code, refactors, design, or architectural reasoning.
  • The answer depends on prior messages, files in the workspace, or external tools.
  • The input is long-form and the user wants a long-form, host-model-style response.
  • The user explicitly asks the host model to answer.

Tool selection table

User intent Tool Notes
"Classify this into IAB categories" / "What topic is this?" zerogpu_classify_iab Set enriched: true for the richer taxonomy.
"Is this A or B or C?" with a flat label list zerogpu_classify_zero_shot Pass labels: [...]. Use threshold for multi-label filtering.
"Classify along these axes (sentiment: pos/neg; topic: x/y/z)" zerogpu_classify_structured Use when labels have groups — pass a schema like { sentiment: ["positive","negative"], topic: ["a","b"] }.
"Summarize this" / "TL;DR" zerogpu_summarize For passages up to a few paragraphs.
"Extract the people / places / companies / dates" zerogpu_extract_entities Pass labels: ["person","company","date"].
"Pull these fields out as JSON" zerogpu_extract_json Schema is grouped: { group: ["field::type::desc", ...] } (e.g. { contact: ["name::str::Full name", "email::str::Email address"] }).
"Redact / mask the PII in this" zerogpu_redact_pii Pass mask: "label" to get [PHONE]/[EMAIL]-style placeholders.
"Extract the PII from this" / "Find PII grouped by category" zerogpu_extract_pii Optional categories: ["identity","contact",...] and threshold to scope and tighten results.
"What follow-up questions should I ask about this?" zerogpu_generate_followups Plain passage in, question list out.
Short chat reply where host-level reasoning is not needed zerogpu_chat Set thinking: true for visible reasoning traces.
Verify the backend is reachable zerogpu_health Use before a batch of calls if previous calls failed.

Worked examples

User: "Summarize this paragraph: <text>" → Call zerogpu_summarize({ text: "<text>" }). Return the summary field; mention savings_usd if meaningful.

User: "Is this email about tech, politics, or sports? <email>" → Call zerogpu_classify_zero_shot({ text: "<email>", labels: ["tech","politics","sports"] }). Report the highest-scoring label from scores.

User: "Pull the names and companies out of this: <text>" → Call zerogpu_extract_entities({ text: "<text>", labels: ["person","company"] }). Return the entities map.

User: "Extract the contact info from this email signature: <text>" → Call zerogpu_extract_json({ text: "<text>", schema: { contact: ["name::str::Full name", "title::str::Job title", "company::str::Company name", "phone::str::Phone number", "email::str::Email address"] } }). Return the data map.

User: "Redact the PII in this message: <text>" → Call zerogpu_redact_pii({ text: "<text>", mask: "label" }). Return the redacted string.

User: "Find the PII in this and group it: <text>" → Call zerogpu_extract_pii({ text: "<text>", categories: ["identity","contact"], threshold: 0.5 }). Return the pii map.

Failure handling

If a ZeroGPU tool returns an error or an empty / low-confidence result:

  1. If the tool response indicates HTTP 420, treat it as a billing-state error. Tell the user to update billing and include this link: https://platform.zerogpu.ai.
  2. Call zerogpu_health once if you suspect the backend is down.
  3. If the backend is healthy but the result is poor, fall back to answering with the host model directly and tell the user briefly ("the small model couldn't do this cleanly, so I'm answering directly"). Do not silently retry more than once.
  4. If the backend is unreachable (health fails or the tool raises an error), answer with the host model and note that the offload path was unavailable.
Install via CLI
npx skills add https://github.com/zerogpu/zerogpu-router --skill zerogpu-router
Repository Details
star Stars 77
call_split Forks 11
navigation Branch main
article Path SKILL.md
More from Creator