name: plan-work-completion-signoff description: Verify implemented plan docs against their actual code, add completion signoff metadata, and surface any gaps as new beads. Designed for scheduler/concierge agents coordinating multi-agent signoff after implementation is done. user-invocable: true allowed-tools: Bash Read Write Edit Grep Glob Task argument-hint: "[plans-dir] [code-base-path]"
Plan Work Completion Signoff
Verify that implemented plan docs match their actual code. For each plan doc, an agent compares every specified feature, API, type, data structure, and test category against the real implementation. Complete docs get a signoff section appended; docs with gaps generate a report so the orchestrator can create follow-up beads.
This skill is a structured decision framework for the scheduler/concierge agent — it orchestrates multi-agent verification work. Individual agents do the actual comparison and signoff.
Inputs
$0(optional): Plans directory (default:docs/plans/)$1(optional): Code base path (default: repo root)
Phase 1: Discover Implemented Plans
- Read the plan index (
docs/plans/00-plan-index.mdor equivalent) - Identify which plan docs have been implemented — look for:
- Closed implementation beads/epics referencing those plans
- Existing code packages that correspond to plan components
- Plan doc status markers (e.g., "Implementation complete" in the index)
- Build a list of plan doc pairs to verify: each plan doc + its companion test harness doc (if exists)
- Exclude docs that already have a
## Completion Signoffsection (already verified in a prior pass)
Decision: Communicate the discovered doc list to the user or concierge for confirmation before proceeding. If the list looks wrong (too many or too few docs), clarify before creating beads.
Phase 2: Create Beads and Assign Agents
- Create an epic bead:
bd create "Plan completion signoff" --type epic --labels project={project} - Group plan docs into tasks — aim for 2-4 docs per task, grouped by component area:
- Group a plan doc with its companion test harness doc in the same task
- Related components can share a task (e.g., a storage layer plan + its test harness plan)
- Don't make tasks too small (one doc each) or too large (8+ docs)
- Create task beads under the epic, one per group
- Assign tasks to available agents. Prefer agents who:
- Wrote the implementation (they know the code best)
- Reviewed the implementation (they know the gaps)
- If original agents are unavailable, any agent can do it — the plan docs and code are self-documenting
Each task bead description should include:
- The specific doc paths to verify
- The corresponding code packages to compare against
- The signoff section format (see Phase 3)
- Instructions to report gaps via h2 message
Deviation Severity Classification
Every deviation between plan and implementation must be classified into one of four severity levels. These levels determine the signoff status — this is not a judgment call, it's a mechanical rule.
| Severity | Definition | Examples | Status Impact |
|---|---|---|---|
| Cosmetic | Names, file layout, import paths, or code organization differs but behavior and contracts are identical. | Struct in types.go instead of foo.go; import path uses v0.1.0 API variant; extra helper functions added. |
Complete |
| Structural | Internal architecture differs but external behavior, data flow, and contracts are preserved. Consumers of this component are unaffected. | No separate internal/envdetect package (logic inlined elsewhere); different internal concurrency strategy; fewer files than planned but same coverage. |
Must resolve before Complete |
| Contractual | Specified interfaces, APIs, type signatures, or data flow contracts don't exist or differ in ways that affect how other components consume this one. The plan says component A produces output X that component B consumes, but in practice A produces Y or B never consumes X. | Plan specifies Match(cmd ExtractedCommand) bool but implementation uses Match(command string) bool; parser produces structured output but matching layer never calls the parser; specified builder DSL (Name(), ArgAt(), Flags()) doesn't exist. |
Partial |
| Missing | Entire components, features, packages, or test categories specified in the plan do not exist in the codebase. | 0 of 75 planned rules implemented; pack registration file doesn't exist; entire test category has no test functions. | Partial or Not Implemented |
Classification Rules
- Cosmetic only → Complete. The plan's intent is realized; names/layout differ cosmetically.
- Structural → Must resolve before Complete. The verifying agent reports the deviation to the scheduler/concierge and reviewer. The team decides: either (a) fix the code to match the plan, or (b) if the implementation is intentionally better/simpler, update the plan doc to match the code. Either way, the deviation is resolved — not just documented. Once resolved, it no longer counts as a deviation.
- Any Contractual deviation → Partial. A contract mismatch means downstream components cannot work as the plan intended. This is true even if the component "works" in isolation — the system design assumed a contract that doesn't hold.
- Significant Missing items → Partial. If specified features don't exist, the plan is not complete.
- Majority Missing → Not Implemented. If the bulk of the plan (>70%) is unimplemented, use "Not Implemented" rather than "Partial."
- When in doubt, classify up (more severe). It's better to flag something as Contractual and have the orchestrator downgrade it than to miss a real gap.
The "Structural" Litmus Test
The key question for classifying a deviation as Structural vs Contractual/Missing is: would an end user, operator, or consuming component observe the difference?
If the answer is yes — the deviation is visible outside the implementation internals — it is not Structural. Structural deviations are strictly internal reorganizations that are invisible to anything outside the package.
Common misclassifications to watch for:
- Missing runnable artifacts: Plan says a binary, CLI command, or build target should exist but only library code was written. The logic is "all there" internally but nobody can actually use it. → Missing, not Structural.
- Missing or changed observable output: Plan specifies log formats, error messages, metric names, API response shapes, or config file formats, but the implementation produces different output or no output. → Contractual, not Structural.
- Deferred scope disguised as deviation: Implementation skips a planned feature and calls it "out of scope for this deliverable." If the plan says it should be there, its absence is Missing regardless of the reason.
- Integration gaps: Component A exists and component B exists, but the plan says A calls B and in practice nothing wires them together. Both "work" in isolation but the system doesn't function as designed. → Contractual, not Structural.
When in doubt, ask: "If I handed this to someone who only read the plan, would they be confused or blocked?" If yes, classify up.
How to Detect Contractual Deviations
Contractual deviations are the hardest to spot because the code may "work" — tests pass, the feature runs. The deviation is in the interface between components, not in the component itself. Look for:
- Type signature mismatches: Plan says
func Foo(x ParsedThing) Result, code saysfunc Foo(x string) Result - Unused outputs: Plan says component A produces structured data for component B, but B never imports or calls A
- Missing interfaces: Plan specifies an interface with N implementations, but the interface doesn't exist and callers use a different pattern
- Data flow breaks: Plan shows a sequence diagram A→B→C, but in code A→C directly (B is bypassed)
- Semantic mismatches: Plan says "matcher operates on parsed AST fields", code does
strings.Containson raw text
Phase 3: Agent Verification Work
Each assigned agent does the following for every doc in their task:
Step 1: Read the Plan Doc
Read the plan doc thoroughly. Build a mental checklist of:
- Every specified type, struct, interface, or enum
- Every specified function, method, or API endpoint
- Every specified algorithm or protocol
- Every specified configuration option or tunable
- Every specified error handling path
- Every specified test category (for test harness docs)
- Every URP/EO/AA claim
- Every cross-component contract — where this component's output is consumed by another component, or where this component consumes another's output
Step 1.5: Check the Implementation Guide
Read docs/plans/00-implementation-guide.md (if it exists). Cross-reference the component being verified against:
- Interface Contracts: Does the implementation respect the exact signatures and semantics listed?
- Lifecycle Ordering Invariants: Does the component's init/shutdown follow the required ordering?
- Common Pitfalls: Has the implementation avoided the known pitfalls relevant to this component?
Violations of Implementation Guide invariants are Contractual-severity deviations — they affect cross-component compatibility even if the component works in isolation.
If signoff discovers new cross-cutting invariants or pitfalls not already in the guide, update the Implementation Guide as part of the signoff process. Add new entries to the appropriate section with a note like "(Discovered during signoff of {component}, {date})".
Step 2: Compare Against Code
For each checklist item:
- Search the codebase for the corresponding implementation
- Verify the implementation matches the spec (names, signatures, behavior)
- Classify every deviation using the four severity levels above (Cosmetic, Structural, Contractual, Missing)
- Note any extras — things implemented but not in the spec (these are fine, just document them)
For test harness docs, additionally verify:
- Each specified test category has corresponding test files/functions
- Test coverage matches what was planned (e.g., "10 property-based tests" → are there actually 10?)
- CI integration works as specified (e.g.,
make test-*commands run the right tests)
Pay special attention to cross-component contracts. Read the plan's dependency descriptions, sequence diagrams, and interface definitions. Then verify that the actual code's imports, function calls, and data flow match. A component that "works" but doesn't connect to the rest of the system as designed has a Contractual deviation.
Step 3: Run Acceptance Tests
If the plan doc defines acceptance criteria, the verifying agent must run those scenarios against the real end-user interface (CLI, API, web UI — whatever the product exposes). Do not just check that acceptance test code exists; actually execute the scenarios and verify the expected outcomes.
Acceptance test failures block Complete signoff just like test harness failures. A failing acceptance test is at minimum a Contractual-severity issue — the component does not work as specified from the user's perspective.
Step 4: Run Verification Tests
Run the relevant test suite to confirm everything passes:
go test ./path/to/package/... -count=1
For race-sensitive packages, also run:
go test -race ./path/to/package/... -count=1
HARD RULE: If any test fails (including the race detector), the signoff CANNOT be Complete. A failing test — whether a logic error, panic, data race, or timeout — is at minimum a Contractual-severity issue. The code does not work as specified. This is not a judgment call:
- Test failure with data race → Contractual (correctness bug, unsafe concurrent access)
- Test failure with wrong result → Contractual (behavior doesn't match spec)
- Test failure with panic/crash → Contractual (code is broken)
The verifying agent must report all test failures as deviations before proceeding to Step 5. Do NOT classify test failures as Cosmetic or Structural — a test failure is always externally observable and always affects correctness.
Step 5: Determine Status and Add Signoff
Use the highest-severity deviation to determine the status:
| Highest Deviation | Status |
|---|---|
| None, or only Cosmetic | Complete |
| Structural | Blocked — resolve before signoff (fix code or update plan) |
| Contractual | Partial |
| Missing (minority of plan) | Partial |
| Missing (majority of plan, >70%) | Not Implemented |
Resolving Structural deviations: When a structural deviation is found, the verifying agent reports it to the scheduler/concierge. The team (verifier + scheduler + reviewer) decides whether the code or the plan should change. If the implementation is intentionally better or simpler, the plan doc is updated to match — this is not "rubber-stamping," it's keeping plan and code in sync. If the plan is correct, the code is fixed. Either way, once resolved, the deviation disappears and signoff can proceed.
Status: Complete — append this section at the very bottom of the doc:
---
## Completion Signoff
- **Status**: Complete
- **Date**: {YYYY-MM-DD}
- **Branch**: {branch-name}
- **Commit**: {HEAD commit hash}
- **Verified by**: {agent-name}
- **Test verification**: `{test command}` — PASS
- **Acceptance tests**: PASS ({N} scenarios) or N/A (no acceptance criteria in plan)
- **Deviations from plan**:
- [Cosmetic] {description, or "None"}
- **Structural deviations resolved**: {count resolved, or "None found"}
- **Additions beyond plan**:
- {List any features implemented that weren't in the original spec, or "None"}
Status: Partial — append this section, and report gaps to the orchestrator:
---
## Completion Signoff
- **Status**: Partial
- **Date**: {YYYY-MM-DD}
- **Branch**: {branch-name}
- **Verified by**: {agent-name}
- **Completed items**: {list of what's done}
- **Deviations**:
- [Contractual] {description — what contract is broken and between which components}
- [Missing] {description — what specified feature/component doesn't exist}
- [Structural] {description, if any}
- **Outstanding gaps**:
- {Gap 1: description, suggested follow-up bead}
- {Gap 2: description, suggested follow-up bead}
Status: Not Implemented — for docs where >70% of the plan is missing:
---
## Completion Signoff
- **Status**: Not Implemented
- **Date**: {YYYY-MM-DD}
- **Branch**: {branch-name}
- **Verified by**: {agent-name}
- **Implementation coverage**: {estimated percentage}
- **What exists**: {brief list of any implemented pieces}
- **What's missing**: {summary of unimplemented scope}
Report all Contractual and Missing deviations to the orchestrating agent via h2 send with:
- The deviation severity and description
- Which components are affected
- Suggested bead description for follow-up work
Step 6: Commit and Report
- Commit the updated plan docs:
git add docs/plans/ && git commit -m "docs: add completion signoff for {doc-names}" - Push to the working branch
- Report completion to the orchestrating agent via
h2 send:- Which docs were signed off
- Which docs have gaps (if any)
- Commit hash
Phase 4: Orchestrator Processes Results
As agents report back, the orchestrating agent:
- For Complete signoffs: Close the signoff task bead once reviews are approved and all available test suites pass. Do NOT keep the bead open waiting on production deployment, manual QA, or staging verification — those are tracked separately (e.g. a release/shipping checklist, a manual QA tracking doc, or per-QA-item beads). The signoff bead's job is to verify the code matches the plan; once that's done and reviewed, close it. Production verification belongs to a different mechanism.
- For Partial signoffs: Evaluate each Contractual and Missing deviation:
- Is it real missing work, or intentional future scope?
- If real: Create a new task bead for the follow-up work, assign to an available agent
- If intentional/deferred: Document the decision but leave status as Partial (do NOT upgrade to Complete — the contract gap still exists)
- HARD RULE: Do NOT close Partial signoff beads or their parent epic. The signoff bead and epic stay open until ALL gaps are resolved (follow-up beads completed, code fixed, re-verified) and the signoff status is upgraded to Complete. A Partial signoff with open gaps means the work is not done.
- For Not Implemented signoffs: These represent entire unbuilt components. Create implementation beads if the work is in scope, or document as out-of-scope if not. Like Partial, do NOT close the signoff bead until the work is done.
- Track progress: how many docs at each status level
- The epic will auto-close when all child beads are closed, so getting the child bead lifecycle right is what matters.
Important: Do not downgrade a Contractual deviation to Structural just because tests pass. Tests passing with a contract mismatch means the tests aren't testing the contract — which is itself a gap.
Phase 5: Report to User/Concierge
Send a final summary:
- Total plan docs verified: N
- Complete: N
- Partial: N (list Contractual/Missing deviations)
- Not Implemented: N (list components)
- Follow-up beads created: N (list)
- All tests passing: yes/no
Beads Integration
Epic: "Plan completion signoff"
├── Task: "Signoff: {component-group-1} ({doc-list})"
├── Task: "Signoff: {component-group-2} ({doc-list})"
├── ...
└── (follow-up beads created for any gaps found)
Checking Status
A companion script reports signoff status across all plan docs:
python3 "$(dirname "$0")/signoff-status.py" docs/plans/
Output formats:
--format text(default): Human-readable summary with counts and per-doc status--format json: Machine-readable for CI integration--format markdown: Markdown table for embedding in reports
The script traverses the plans directory (recursing into subdirectories), finds all plan docs (excluding index, architecture, shaping, summary, and review files), and checks each for a ## Completion Signoff section. It reports:
- Total docs, complete count, partial count, not-started count
- Per-doc details: path, type (plan vs test-harness), status, date, verified-by
- Type breakdown: plan docs vs test harness docs
Exit code 0 on success; exit code 1 only on errors (e.g., invalid directory).
Run this after the signoff process to verify everything is covered, or in CI to gate merges on plan completion.
What Requires Judgment
The deviation severity classification removes most ambiguity from status determination, but these calls still require judgment:
- Which docs to include — only implemented plans, not future/unstarted plans
- How to group docs into tasks — balance between parallelism and cognitive coherence
- Cosmetic vs Structural — is a file layout change purely cosmetic, or does it affect how developers find and maintain code? When in doubt, classify as Structural (it must be resolved, but the resolution may be as simple as updating the plan).
- Structural resolution direction — should the code change to match the plan, or should the plan be updated to match the code? The team (verifier + scheduler + reviewer) decides. If the implementation is genuinely better/simpler, update the plan. If the plan had it right, fix the code.
- Structural vs Contractual — does the architectural difference affect cross-component contracts? The key test: would a developer implementing a downstream component based on the plan be surprised or blocked by the actual implementation? If yes, it's Contractual.
- Whether a Contractual gap needs a follow-up bead — some contract mismatches may be intentional improvements. The orchestrator decides whether to create follow-up work, but the status stays Partial regardless.
- How to handle unavailable agents — reassign to whoever is online
- Whether to re-verify after gap fixes — if follow-up beads are created, should the signoff be re-run after they're closed?