vfpf-eval

star 0

Portable evaluation skill for pi. Use when comparing runs, agent mixes, or orchestration changes for Claude and Codex. Only compare results inside an explicit comparability frame and tie outputs back to plans/<feature>/.

venikman By venikman schedule Updated 4/1/2026

name: vfpf-eval version: 0.1.0 description: | Portable evaluation skill for pi. Use when comparing runs, agent mixes, or orchestration changes for Claude and Codex. Only compare results inside an explicit comparability frame and tie outputs back to plans//. allowed-tools: - Read - Write - Edit

vfpf-eval

Goal

Evaluate workflow quality with comparable cohorts instead of anecdotes.

Rules

  • Define the comparability frame first.
  • Compare the same task family, risk class, and acceptance surface.
  • Track evidence coverage, intervention count, elapsed time, and outcome quality.
  • Track FPF indicators like review independence, resumability, and release truth alongside runtime metrics.
  • Record conclusions without collapsing unlike runs into one score.

Required artifacts

  • plans/<feature>/context.md
  • plans/<feature>/plan.md
  • plans/<feature>/verification.md

Exit criteria

  • Compared runs share the same evaluation frame.
  • Reported differences are attributable, not anecdotal.
  • The result can inform the next orchestration change.
Install via CLI
npx skills add https://github.com/venikman/gstack --skill vfpf-eval
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator