name: iu-endpoint description: Validate the IU unified endpoint and discover better models. Probes every transport, health-checks the models configured in OpenCode (opencode.json), reports per-model backend redundancy + live latency, and diffs the live catalog to surface newer/hotter models worth adopting — for OpenCode and for the Hermes Agent (especially Kimi). Use when checking endpoint health, picking a reliable model/host, or deciding whether to upgrade configured models.
IU Unified Endpoint — validate & discover
The IU unified endpoint (op://common/anthropic, host derived from BASE_URL)
fronts many model backends through several transports. This skill answers three
questions: what's up right now, which alias gives the most reliable host,
and are there newer/better models I should switch to (OpenCode + Hermes).
Key facts (the mental model)
- The model alias is the host selector. Each id maps to one or more backend
"sinks" (
owned_byin the catalog). More backends = more redundant = less likely to 429/timeout. The validator printsbackends=Nper model.- Examples:
claude-opus-4-6(3 backends) is steadier thanclaude-opus-4-7(1).Kimi-K2.5(2: Nebius + Azure) is steadier thanKimi-K2.6(1: Sweden Central, throttle-prone).gpt-5(9) is the most redundant GPT.
- Examples:
- Transports on the same host:
/anthropic/v1,/openai/v1(rich catalog),/azure/openai/...,/gemini/v1beta,/replicate/v1. There is no/bedrockpassthrough (404) — Bedrock is only an internal backing. - Resilience in OpenCode: the AI SDK auto-retries
429s with backoff. For hard failover across models, switch alias (or, in Hermes, code a primary→fallback). - Probe quirks:
gpt-5*reasoning models needmax_completion_tokens(notmax_tokens);*-codexmodels return empty over chat-completions (responses API only) — don't configure them for OpenCode. The validator handles the first. - Auth:
/openai+/replicateuseAuthorization: Bearer;/anthropicusesx-api-key+anthropic-version;/azureusesapi-key;/geminiusesx-goog-api-key. Same single key for all. Never print the key.
How to run
bash .claude/skills/iu-endpoint/validate.sh # full: transports + health + catalog diff + Hermes
bash .claude/skills/iu-endpoint/validate.sh --quick # transports + configured-model health only
The script reads the key from Keychain (claude-sdk-api-key) and never prints it.
Full run takes ~30–60s (it sends a tiny completion per configured model).
What to do with the output
- Health triage. Call out any configured model that is not
ok:THROTTLED(429) → transient capacity; prefer a higher-backendssibling.TIMEOUT/ERR(503)→ backend down or saturated; note it, suggest alias.- High latency on a 1-backend model → flag as slow-prone.
- Reliability advice. When two aliases serve the same model tier, recommend
the higher-
backendsone as default (e.g. opus-4-6 over opus-4-7 for daily use). - Discover upgrades. In the
NOTABLElist, find[NEW]ids that are a newer version or stronger sibling of a configured[cfg]model (e.g. a newer Gemini preview, a higher GPT-5.x, a newer Kimi). For each genuinely better one, propose the exactopencode.jsonedit (under provideriuoriu-anthropic), withtool_call/reasoning/attachment/limit. Verify it actually completes first with a one-shot curl (usemax_completion_tokensforgpt-5*). Only recommend models that return real text. - Hermes advice. Compare the models grepped from
~/SourceRoot/hermes-agentagainst the best available. Be specific about Kimi (the user runs Kimi in Hermes): K2.6 is single-backend/throttle-prone, K2.5 is dual-backend/steadier — recommend a primary + fallback (e.g. K2.6 primary, K2.5 fallback) and point at where in hermes-agent the model is wired. Flag any newer Claude/Gemini brain worth switching to. Do not edit hermes-agent from here unless asked. - Report concisely. A short health summary, a ranked "consider adopting" list with backend counts, and any concrete config edits. No key, no raw catalog dump.
Updating OpenCode config
config/opencode/opencode.json (symlinked to ~/.config/opencode/opencode.json)
holds the curated model set. Editing it needs no make setup (it's a symlink).
Keep the set lean and current; this skill is the mechanism for keeping it so.
After any edit here, commit in dotfiles.