artificial-analysis-live

star 1

Live Artificial Analysis provider+model benchmark extraction and querying. Use this whenever the user asks benchmark comparisons, model-vs-provider tradeoffs, fastest/cheapest provider for a model, latency/speed/cost rankings, provider deltas over time, or any question that needs fresh endpoint-level data (not just model-level API data). Trigger even if user only says 'compare models/providers' without naming Artificial Analysis explicitly.

anntnzrb By anntnzrb schedule Updated 6/2/2026

name: artificial-analysis-live description: Live Artificial Analysis provider+model benchmark extraction and querying. Use this whenever the user asks benchmark comparisons, model-vs-provider tradeoffs, fastest/cheapest provider for a model, latency/speed/cost rankings, provider deltas over time, or any question that needs fresh endpoint-level data (not just model-level API data). Trigger even if user only says 'compare models/providers' without naming Artificial Analysis explicitly. license: GPL-3.0-or-later compatibility: Requires uv and network access. metadata: author: anntnzrb allowed-tools: ""

artificial-analysis-live

AI-first skill for fresh Artificial Analysis endpoint data.

Core rule

Do not answer benchmark/provider questions from stale memory. Run the tool first.

Entry points

  • With SKILLS_DIR: uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" ...
  • Direct: uv run --script <skill-dir>/scripts/cli.py ...

Fast path

uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" fetch

Commands

fetch

Get live snapshot from RSC source and write outputs.

uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" fetch

query

Deterministic filter/sort over snapshot rows.

uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" query --model claude-opus-4-7 --sort-by speed --order desc --limit 5

qa

Minimal NL command that maps question -> query args.

uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" qa "best provider for claude opus 4.7 by speed top 3"

coding

Fetch/query the Coding Index capability page and return coding-index-only token composition.

uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" coding --sort-by coding --limit 10
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" coding --model gpt-5-5 --include-benchmark-counts

Use this for Coding Index output token composition. These counts are scoped to the Coding Index evaluation only, not global Intelligence Index token counts. Output tokens are answer_tokens + reasoning_tokens; components are Terminal-Bench Hard + SciCode.

reasoning

Profile models by reasoning selectivity — per-benchmark breakdown of answer vs thinking token splits at max effort.

uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" reasoning --sort-by selectivity --limit 10
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" reasoning --model minimax-m3 --benchmarks
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" reasoning --selective-only
uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" reasoning --class selective_extreme

Uses canonical_eval_token_counts from the local snapshot — no live fetch needed. Metrics include reasoning floor (minimum reasoning share), reasoning ceiling, weighted reasoning share, and a selectivity classification. See references/reasoning-selectivity.md for definitions and caveats.

stats

Snapshot counts + top providers.

uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" stats

diff

Compare two snapshots.

uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" diff old.json new.json

schema

Machine-readable capability contract.

uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" schema

RPC mode

Use when another agent/process needs JSONL envelopes.

uv run --script "$SKILLS_DIR/artificial-analysis-live/scripts/cli.py" --mode rpc

Output policy

  • Tool output is JSON only.
  • Prefer coding for Coding Index token composition/cost breakdown questions.
  • Prefer reasoning for model reasoning selectivity and per-benchmark token split questions.
  • Prefer query for provider endpoint ranking answers.
  • Prefer qa only when user asks in plain language and speed matters.
  • If data freshness is critical, run fetch immediately before query/qa.

Reliability defaults

  • ETag cache + 304 reuse
  • last-good fallback unless --strict
  • sanity thresholds (--min-endpoints, --min-providers)
  • extraction heuristics tolerate upstream schema key changes

Read only what you need

  • Command usage: README.md
  • JSON envelopes and fields: references/output-contract.md
  • Failure modes and recovery: references/troubleshooting.md
Install via CLI
npx skills add https://github.com/anntnzrb/agents --skill artificial-analysis-live
Repository Details
star Stars 1
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator