name: vfpf-eval
version: 0.1.0
description: |
Portable evaluation skill for pi. Use when comparing runs, agent mixes, or
orchestration changes for Claude and Codex. Only compare results inside an
explicit comparability frame and tie outputs back to plans//.
allowed-tools:
- Read
- Write
- Edit
vfpf-eval
Goal
Evaluate workflow quality with comparable cohorts instead of anecdotes.
Rules
- Define the comparability frame first.
- Compare the same task family, risk class, and acceptance surface.
- Track evidence coverage, intervention count, elapsed time, and outcome quality.
- Track FPF indicators like review independence, resumability, and release truth alongside runtime metrics.
- Record conclusions without collapsing unlike runs into one score.
Required artifacts
plans/<feature>/context.mdplans/<feature>/plan.mdplans/<feature>/verification.md
Exit criteria
- Compared runs share the same evaluation frame.
- Reported differences are attributable, not anecdotal.
- The result can inform the next orchestration change.