benchflow-experiment-review

star 254

Review Benchflow or SkillsBench task-run trajectories and integration-test Benchflow code changes. Use this skill whenever the user asks to audit traj health, failed or timed-out runs, healthy pass/fail/timeout status, no-skill leakage, skill loading, reward hacking, verifier isolation, metadata completeness, token usage, timing, Daytona-vs-Docker parity, path/root handling, coverage gaps, Docker/Daytona failures, or release-readiness of benchmark data.

benchflow-ai By benchflow-ai schedule Updated 6/3/2026

Skill instructions (SKILL.md) could not be loaded from local cache or raw GitHub repository.

Install via CLI
npx skills add https://github.com/benchflow-ai/benchflow --skill benchflow-experiment-review
Repository Details
star Stars 254
call_split Forks 30
navigation Branch main
article Path SKILL.md
More from Creator
benchflow-ai
benchflow-ai Explore all skills →