benchflow-experiment-review

star 254

Review Benchflow or SkillsBench task-run trajectories and integration-test Benchflow code changes. Use this skill whenever the user asks to audit traj health, failed or timed-out runs, healthy pass/fail/timeout status, no-skill leakage, skill loading, reward hacking, verifier isolation, metadata completeness, token usage, timing, Daytona-vs-Docker parity, path/root handling, coverage gaps, Docker/Daytona failures, or release-readiness of benchmark data.

By benchflow-ai schedule Updated 6/3/2026

play_arrow Run Skill in Manus View GitHub

Skill instructions (SKILL.md) could not be loaded from local cache or raw GitHub repository.

Install via CLI

npx skills add https://github.com/benchflow-ai/benchflow --skill benchflow-experiment-review

Repository Details

star Stars 254

call_split Forks 30

navigation Branch main

article Path SKILL.md