name: artificial-analysis-live
description: Live Artificial Analysis provider+model benchmark extraction and querying. Use this whenever the user asks benchmark comparisons, model-vs-provider tradeoffs, fastest/cheapest provider for a model, latency/speed/cost rankings, provider deltas over time, or any question that needs fresh endpoint-level data (not just model-level API data). Trigger even if user only says 'compare models/providers' without naming Artificial Analysis explicitly.
license: GPL-3.0-or-later
compatibility: Requires uv and network access.
metadata:
author: anntnzrb
allowed-tools: ""
artificial-analysis-live
AI-first skill for fresh Artificial Analysis endpoint data.
Core rule
Do not answer benchmark/provider questions from stale memory. Run the tool first.
Entry points
- With
SKILLS_DIR:uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" ... - Direct:
uv run --script <skill-dir>/scripts/cli.py ...
Fast path
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" fetch
Commands
fetch
Get live snapshot from RSC source and write outputs.
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" fetch
query
Deterministic filter/sort over snapshot rows.
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" query --model claude-opus-4-7 --sort-by speed --order desc --limit 5
qa
Minimal NL command that maps question -> query args.
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" qa "best provider for claude opus 4.7 by speed top 3"
coding
Fetch/query the Coding Index capability page and return coding-index-only token composition.
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" coding --sort-by coding --limit 10
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" coding --model gpt-5-5 --include-benchmark-counts
Use this for Coding Index output token composition. These counts are scoped to the Coding Index evaluation only, not global Intelligence Index token counts. Output tokens are answer_tokens + reasoning_tokens; components are Terminal-Bench Hard + SciCode.
reasoning
Profile models by reasoning selectivity — per-benchmark breakdown of answer vs thinking token splits at max effort.
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" reasoning --sort-by selectivity --limit 10
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" reasoning --model minimax-m3 --benchmarks
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" reasoning --selective-only
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" reasoning --class selective_extreme
Uses canonical_eval_token_counts from the local snapshot — no live fetch needed. Metrics include reasoning floor (minimum reasoning share), reasoning ceiling, weighted reasoning share, and a selectivity classification. See references/reasoning-selectivity.md for definitions and caveats.
stats
Snapshot counts + top providers.
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" stats
diff
Compare two snapshots.
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" diff old.json new.json
schema
Machine-readable capability contract.
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" schema
RPC mode
Use when another agent/process needs JSONL envelopes.
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" --mode rpc
Output policy
- Tool output is JSON only.
- Prefer
codingfor Coding Index token composition/cost breakdown questions. - Prefer
reasoningfor model reasoning selectivity and per-benchmark token split questions. - Prefer
queryfor provider endpoint ranking answers. - Prefer
qaonly when user asks in plain language and speed matters. - If data freshness is critical, run
fetchimmediately beforequery/qa.
Reliability defaults
- ETag cache + 304 reuse
- last-good fallback unless
--strict - sanity thresholds (
--min-endpoints,--min-providers) - extraction heuristics tolerate upstream schema key changes
Read only what you need
- Command usage:
README.md - JSON envelopes and fields:
references/output-contract.md - Failure modes and recovery:
references/troubleshooting.md