vfpf-eval

name: vfpf-eval version: 0.1.0 description: | Portable evaluation skill for pi. Use when comparing runs, agent mixes, or orchestration changes for Claude and Codex. Only compare results inside an explicit comparability frame and tie outputs back to plans//. allowed-tools: - Read - Write - Edit

Evaluate workflow quality with comparable cohorts instead of anecdotes.

Define the comparability frame first.
Compare the same task family, risk class, and acceptance surface.
Track evidence coverage, intervention count, elapsed time, and outcome quality.
Track FPF indicators like review independence, resumability, and release truth alongside runtime metrics.
Record conclusions without collapsing unlike runs into one score.