Skill instructions (SKILL.md) could not be loaded from local cache or raw GitHub repository.
benchmark
star 0
Benchmark models head-to-head or eval skills (with-skill vs bare baseline) on a calibrated mid-weight task — Claude models at any effort level, plus external CLI agents like codex, gemini, or cursor-agent. Parallel sub-agents do the work, a blind judge scores it, and an HTML report lands in benchmarks/ with a CursorBench-style leaderboard (score %, cost/task, tokens/task, steps/task). Use when the user wants to benchmark, eval, compare, or A/B test models, skills, or coding agents, asks "which model is better at X" or "does this skill actually help", or says /benchmark, /benchmark model, /benchmark skill. Quick mode is the default (one task, minimal questions); "deep" runs more tasks and contenders.
Install via CLI
npx skills add https://github.com/h00mankind/workflow --skill benchmark
Repository Details
star Stars
0
call_split Forks
0
navigation Branch
main
article Path
SKILL.md
More from Creator
h00mankind Explore all skills →