cverify - SKILL.md Agent Skill

name: cverify description: Verify implementation matches spec. Check rule coverage, undocumented dependencies, architecture compliance. Writes verification report and drift debt. Run after /ctdd completes. allowed-tools: Read, Grep, Glob, Bash(git), Bash(test), Bash(coverage), Bash(diff), Bash(workflow-advance.sh), Bash(jq), Bash(mutmut), Bash(stryker), Bash(cargo-mutants), Bash(go-mutesting), Bash(lint), Bash(clippy), Bash(ruff), Bash(eslint), Edit, Write(.correctless/verification/), Write(.correctless/meta/drift-debt.json), Write(.correctless/meta/intensity-calibration.json), Write(.correctless/artifacts/*) context: fork interaction_mode: hybrid

/cverify — Post-Implementation Verification

Shared constraints apply. Before executing, read _shared/constraints.md from the parent of this skill's base directory. All constraints there apply to this skill.

You are the verification agent. You did NOT participate in the implementation. Your job is to check that what was built matches what was specced. Your lens: "The tests pass and QA approved — but does the implementation actually satisfy the spec, or does it just satisfy the test cases?"

Intensity Configuration

	Standard	High	Critical
Rule coverage	Exists + weak detection	Full matrix + Serena trace	Full + mutation survivor analysis
Dependencies	List + license	List + CVE + maintenance	Full audit
Architecture	Basic compliance	Full + drift detection	Full + cross-spec + prohibitions

Effective Intensity

Determine the effective intensity using the computation in the shared constraints (_shared/constraints.md).

Progress Visibility (MANDATORY)

Intensity-Aware Verification Behavior

At standard intensity: rule coverage checks for existence and weak detection. Dependencies get list + license check. Architecture gets basic compliance review.
At high intensity: rule coverage uses full matrix + Serena trace for symbol-level tracing. Dependencies include CVE scanning and maintenance status. Architecture gets full review with drift detection.
At critical intensity: rule coverage includes full matrix plus mutation survivor analysis. Dependencies undergo full audit. Architecture review includes cross-spec consistency checks and prohibition enforcement.

Verification takes 10-15 minutes with mutation testing running in the background. The user must see progress throughout.

Before starting, create a task list:

Read context (spec, implementation, tests, .correctless/ARCHITECTURE.md)
Rule coverage matrix
Mutation testing (background)
Dependency check
Basic smell check
Drift detection
Architecture adherence
Write verification report

Between each check, print a 1-line status: "Rule coverage complete — {N}/{M} rules covered, {K} weak. Starting mutation testing in background..." When mutation testing completes in the background, announce immediately: "Mutation testing done — {N} mutations, {M} killed, {K} survivors."

Mark each task complete as it finishes.

Before You Start

First-run check: If .correctless/config/workflow-config.json does not exist, tell the user: "Correctless isn't set up yet. Run /csetup first — it configures the workflow and populates your project docs." If the config exists but .correctless/ARCHITECTURE.md contains {PROJECT_NAME} or {PLACEHOLDER} markers, offer: ".correctless/ARCHITECTURE.md is still the template. I can populate it with real entries from your codebase right now (takes 30 seconds), or run /csetup for the full experience." If the user wants the quick scan: glob for key directories, identify 3-5 components and patterns, use Edit to replace placeholder content with real entries, then continue.

Read .correctless/AGENT_CONTEXT.md for project context.
Read the spec artifact (path from workflow-advance.sh status output, Spec: line).
Read the implementation — changed files on the branch.
Read the test files.
Read .correctless/ARCHITECTURE.md.
Read .correctless/meta/workflow-effectiveness.json — check which phases have historically missed bugs in this area.
Read .correctless/artifacts/qa-findings-*.json — see what QA found and fixed during TDD.
Determine the default branch (check workflow-config.json for workflow.default_branch, fall back to main). Run git diff {default_branch}...HEAD --stat to see what changed.
Record full-suite-green sentinel (CS-019 / QA-002 / QA2-001). Run the FULL tests/test-*.sh suite (commands.test). If it passes, write the fixed-name test-success sentinel .correctless/artifacts/test-success.sha whose CONTENT is the current HEAD SHA, so the done-transition gate (_done_phase_gate) has a live sentinel to content-match against (absence is silent; a recorded SHA that no longer equals HEAD refuses the transition). The filename is fixed — do NOT key it on the HEAD SHA, or the mismatch branch becomes unreachable. .correctless/artifacts/ is gitignored, so this stays local:
```
printf '%s\n' "$(git rev-parse HEAD)" > ".correctless/artifacts/test-success.sha"
```

What to Check

1. Rule Coverage

For each R-xxx / INV-xxx in the spec:

Is there a test that references this rule ID? (grep test files for R-001, etc.)
Does the test actually probe the rule, or is it a trivial assertion?
Would the test fail if the rule were violated?
For rules tagged [integration]: is the test actually an integration test using the real system path?

Result: a table of R-xxx → test name → status (covered / uncovered / weak / wrong-level).

Uncovered rules are BLOCKING findings. Weak tests are findings. Integration rules tested only at unit level are findings.

2. Dependency Check

Diff the package manifest against the base branch: Use the project's default branch (from workflow-config.json, usually main):

git diff {default_branch}...HEAD -- package.json go.mod Cargo.toml requirements.txt pyproject.toml

For each new dependency: what is it, which file introduced it, was it in the spec?

Monorepo: Multi-Package Verification

If workflow-config.json has is_monorepo: true and the spec lists "Packages Affected", run tests in ALL listed packages — not just the one where most code changed. Use the per-package test commands from workflow-config.json. Report per-package: "Package api: all tests pass. Package web: 2 tests fail."

3. Architecture Adherence

Complementarity note: The Architecture Compliance Agent (Phase 4) checks whether PR diffs violate entries — a violation lens. This section checks the inverse: whether entries need updating after implementation — a maintenance lens ("do entries need updating?"). /cdocs acts on these findings. /cupdate-arch does comprehensive validation of ALL entries.

For each affected .correctless/ARCHITECTURE.md entry, verify that the entry's structural claims still hold after this feature's changes. This is NOT the same as the Phase 4 Architecture Compliance Agent's check types — those remain that agent's domain and are not duplicated here.

Step-by-step:

Extract all ABS-xxx, PAT-xxx, TB-xxx, ENV-xxx entries from .correctless/ARCHITECTURE.md. If no entries exist, the architecture adherence check is dormant — no error, no warning. Skip to drift-debt surfacing.
Get changed files: Run git diff {default_branch}...HEAD --name-only to get the list of files changed by the feature.
Identify affected entries — entries whose Enforced at, Test, or consumer/path references overlap with changed files. Only these entries are checked — do not validate every entry.
For each affected entry, check:
- (a) Enforced at paths exist on disk: verify each file path in the Enforced at field exists. Strip parenthetical annotations (e.g., scripts/lib.sh (source) → scripts/lib.sh) and backtick formatting before checking. Skip entries that reference non-file entities (e.g., setup, function names without file paths). When an entry uses wildcards (e.g., hooks/*.sh), verify at least one matching file exists via glob.
- (b) Test paths exist and reference the entry ID: verify each file path in the Test field exists, and grep it for the entry ID (e.g., ABS-001).
- (c) Invariant text conflicts: check whether the Invariant text conflicts with what the feature changed — does the implementation contradict or invalidate the stated invariant?
Report findings with advisory severity levels (these are for /cdocs prioritization — they do not gate /cverify advancement, non-blocking advisory):
- path-missing = HIGH
- test-ID-missing = MEDIUM
- invariant-conflict = MEDIUM
- consumers-incomplete = LOW

Drift-debt surfacing: Read .correctless/meta/drift-debt.json and surface open items whose rule_id, description, or spec_id references an architecture entry ID (ABS/PAT/TB/ENV) OR whose description references files changed by the feature. Include each relevant drift-debt item in the verification report. Dormant when drift-debt.json is absent, empty, or has no open items (PAT-019).

3a. Compliance Checks (if configured)

Read workflow.compliance_checks from workflow-config.json. For each check where phase is "verify":

Run the command
Report results: pass/fail with output
If blocking: true and the check fails: this is a BLOCKING finding — verification cannot pass

Compliance checks are custom scripts written by the team. Correctless runs them at the right time and reports results. Example config:

"compliance_checks": [{"name": "audit-logging", "command": "./scripts/check-audit-logging.sh", "phase": "verify", "blocking": true}]

4. Antipattern Scan and Basic Smell Check

Run the deterministic antipattern-scan script to detect mechanical code smells:

bash .correctless/scripts/antipattern-scan.sh {default_branch}

where {default_branch} is read from workflow.default_branch in workflow-config.json, falling back to main if absent.

Validate that stdout is non-empty valid JSON with a .findings key before treating it as findings. Empty or invalid output means the scanner itself failed and must be reported as an error, not "zero findings." Also check if the JSON contains an errors array with entries — if so, report these scanner errors to the user rather than silently discarding them.

If the JSON output includes a summaries array (present when files exceed the 20-finding cap), include these in the report.

Include the results in the verification report under an "## Antipattern Scan" section with a table of findings. Also review the semantic ai-antipatterns checklist at .correctless/checklists/ai-antipatterns.md for patterns not detectable by grep.

Additionally check for:

TODO/FIXME/HACK comments, debug statements, commented-out code
Overly broad error catches, hardcoded values, unused imports

5. Drift Detection

Compare the spec's rules against the implementation:

Does the code actually use the abstractions the spec says it should?
Are there code paths not covered by any spec rule?
For rules with implemented_in fields: do those files/functions still exist?

If drift is found, present each drift item to the human with options:

  1. Fix (recommended) — update code or spec to resolve drift
  2. Log as debt — create DRIFT-NNN entry for future resolution
  3. Accept as intentional — document why the drift is correct

  Or type your own: ___

For items where the user chooses "Log as debt": Read .correctless/meta/drift-debt.json first, then APPEND new entries to the existing drift_debt array. Use Edit to add entries — do NOT overwrite the file with Write. Use the next sequential DRIFT-NNN ID.

Drift debt entry format:

{
  "drift_debt": [
    {
      "id": "DRIFT-NNN",
      "spec_id": "task-slug",
      "rule_id": "R-xxx",
      "description": "what drifted",
      "detected": "ISO date",
      "status": "open"
    }
  ]
}

6. Cross-Reference QA Findings

Read .correctless/artifacts/qa-findings-{task-slug}.json (if it exists). For each class fix that QA identified:

Was the structural test actually added?
Does it cover the class of bug, not just the instance?

7. Spec Update History

If the spec was updated during TDD, note what changed and why.

Output: Write Verification Report

Write the report to .correctless/verification/{task-slug}-verification.md. This is not optional — downstream skills depend on this file.

# Verification: {Task Title}

## Rule Coverage
| Rule | Test | Status | Notes |
|------|------|--------|-------|
| R-001 | TestUserRegistration | covered | |
| R-002 | TestEmailValidation | covered | |
| R-003 | — | UNCOVERED | no test references R-003 |
| R-004 [integration] | TestConfigWiring | covered | integration test present |

## Dependencies
- + zod@3.22.0 — input validation (src/routes/register.ts)

## Architecture Adherence

Per-entry lines: `- {entry-ID}: {status} — {one-line description}` where status is `valid`, `stale`, or `path-missing`.

- ABS-001: valid — shared script library paths verified
- PAT-003: stale — Enforced at path `hooks/old-gate.sh` missing on disk
- TB-002: valid — trust boundary invariant consistent with implementation

### Drift Debt
- DRIFT-001: R-003 drift — config parsing moved from `src/config.ts` to `src/config/index.ts`

{N} entries checked, {M} stale, {K} drift-debt items

## QA Class Fixes Verified
- QA-001: structural config wiring test added ✓

## Smells
- src/routes/register.ts:42 — TODO: add rate limiting

## Drift
- (none found, or DRIFT-NNN entries created)

## Spec Updates
- 1 update from tdd-impl: "R-002 reworded"

## Overall: PASS/FAIL with N findings

After Verification

Commit Metadata (Git Trailers)

If workflow.git_trailers is true in workflow-config.json, stage the verification report and commit with trailers:

verify(task-slug): verification complete

Spec: .correctless/specs/{task-slug}.md
Rules-covered: R-001, R-002, R-003, ...
QA-rounds: {N}
Verified-by: /cverify

The Verified-by: /cverify trailer signals that this commit passed structured verification. Queryable: git log --format='%(trailers:key=Verified-by)'.

Git Notes (optional)

If workflow.git_notes is true in workflow-config.json, attach a verification summary as a git note:

git notes add -f -m "Verified by /cverify: {N}/{M} rules covered, {K} drift items, {J} findings" HEAD

Reviewers can see this with git notes show HEAD or git log --notes.

Write Calibration Entry

Before advancing the workflow state, write a calibration entry to .correctless/meta/intensity-calibration.json. This records outcome data that /cspec reads to improve future intensity recommendations.

If .correctless/meta/ does not exist, create it (mkdir -p .correctless/meta). If the file does not exist, create it with an empty calibration_entries array. Append a new entry to the calibration_entries array with this schema:

{
  "calibration_entries": [
    {
      "feature_slug": "task-slug from spec/workflow state",
      "recommended_intensity": "standard|high|critical — read from the spec's Recommended-intensity metadata field (the system's pre-override suggestion)",
      "actual_intensity": "standard|high|critical — read from the spec's Intensity metadata field (the approved post-override level)",
      "actual_qa_rounds": "number — read from the workflow state file (qa_rounds field)",
      "actual_findings_count": "number — count of BLOCKING findings only from qa-findings-{slug}.json (not MEDIUM/LOW)",
      "actual_tokens": "integer — sum of total_tokens from the token log JSONL file (see below)",
      "actual_cost_usd": "number or absent — read from cost artifact if it exists (see below)",
      "actual_spec_updates": "number — read from the workflow state file (spec_updates field)",
      "harness_version": "integer or absent — current HARNESS_VERSION constant from scripts/harness-fingerprint.sh (BND-005 of harness-fingerprint spec)",
      "fix_rounds_triggered": "integer — derived: max(0, qa_rounds - 1) + mini_audit_fix_rounds (see below)",
      "file_paths_touched": ["array of file paths from git diff against the default branch"],
      "timestamp": "ISO 8601 string"
    }
  ]
}

harness_version field (BND-005 of harness-fingerprint spec): extract the current HARNESS_VERSION constant from scripts/harness-fingerprint.sh (or .correctless/scripts/harness-fingerprint.sh in installed projects). Read with: grep -E '^HARNESS_VERSION=' scripts/harness-fingerprint.sh | head -1 | sed 's/HARNESS_VERSION=//'. Include the integer in every new calibration entry so /cmodelupgrade's three-tier bootstrap lookup (exact-match pool / pre-fingerprint pool / no-baseline) can distinguish entries by harness generation. If the script is missing, omit the field — do not error.

Field sources:

recommended_intensity: Read from the spec's Recommended-intensity metadata field. This is the pre-override system suggestion written by /cspec.
actual_intensity: Read from the spec's Intensity metadata field. This is the approved post-override level.
actual_qa_rounds: Read from the workflow state file (qa_rounds field).
actual_spec_updates: Read from the workflow state file (spec_updates field).
actual_findings_count: Count only BLOCKING findings from qa-findings-{slug}.json. MEDIUM and LOW findings indicate thorough QA, not insufficient intensity.
actual_tokens: Sum of total_tokens from the token log JSONL file for this branch. See "Token Summation for actual_tokens" below.
actual_cost_usd: Read total_cost_usd from the cost artifact at .correctless/artifacts/cost-{branch-slug}.json if it exists. If the cost artifact does not exist (e.g., /cdocs hasn't run yet), omit actual_cost_usd from the calibration entry entirely — do not set it to 0, just leave it absent. The cost artifact is the canonical source of USD cost data (ABS-026).
fix_rounds_triggered: Derived value: max(0, qa_rounds - 1) + mini_audit_fix_rounds. qa_rounds is read from the workflow state — QA round 1 is the initial QA, rounds 2+ are fix rounds (so qa_rounds - 1 = fix rounds from QA). mini_audit_fix_rounds is the count of fix-loop re-entries during the mini-audit phase, derived from qa-findings JSON round entries with MA- prefix that triggered fix loops. Default to 0 when not determinable.
file_paths_touched: Collect from git diff {default_branch}...HEAD --name-only.
timestamp: Current ISO 8601 timestamp.

Token Summation for actual_tokens

The actual_tokens field in the calibration entry is an integer representing total token usage for this feature. Read the branch name from the workflow state file's .branch field, then derive the branch_slug by passing that branch name to branch_slug() in scripts/lib.sh. Use the resulting slug to locate the token log file at .correctless/artifacts/token-log-{branch-slug}.jsonl.

Compute the slug and sum tokens with these deterministic commands — do NOT use LLM arithmetic or hand-construct the slug:

# Step 1: Read the branch name from the workflow state file
FEATURE_BRANCH="$(jq -r '.branch // empty' .correctless/artifacts/workflow-state-*.json 2>/dev/null | head -1)"

# Step 2: Derive the slug using branch_slug() with the branch name parameter
source scripts/lib.sh
SLUG="$(branch_slug "$FEATURE_BRANCH")"

# Step 3: Sum total_tokens from the token log
jq -R 'try (fromjson | .total_tokens // 0) catch 0' ".correctless/artifacts/token-log-${SLUG}.jsonl" | jq -s 'add // 0'

This reads each line as raw text (-R), attempts to parse it as JSON (fromjson), extracts total_tokens (defaulting to 0), and catches parse errors on malformed lines (outputting 0). The second jq sums all values.

Missing or empty token log: If the token log file does not exist or is empty, set actual_tokens to 0.

Write actual_tokens as an integer in the calibration entry alongside the other fields.

Write this calibration entry before advancing the workflow state — calibration data must be persisted even if the advance step fails.

Advance the state machine:

.correctless/hooks/workflow-advance.sh verified

This checks that the verification report file exists. If it doesn't, the transition fails.

After advancing, print the pipeline diagram:

At standard intensity:

  ✓ spec → ✓ review → ✓ tdd → ✓ verify → ▶ docs → merge

At high+ intensity:

  ✓ spec → ✓ review → ✓ tdd → ✓ verify → ▶ arch → docs → audit → merge

Next step is mandatory:

If BLOCKING findings exist: they MUST be fixed first. Return to the TDD cycle.
After fixing and re-verifying: tell the human to run /cdocs. This is the final step before merge.
Do NOT say "ready to merge" until /cdocs has run and workflow-advance.sh documented has been called.

Claude Code Feature Integration

Task Lists

See "Progress Visibility" section above — task creation and narration are mandatory.

Context Enforcement

Context enforcement (mandatory): Before starting mutation testing, check context usage. Verification reads many files and the orchestrator must stay coherent to write an accurate report. If above 70%: "Context at {N}%. Run /compact before I continue — remaining checks may produce incomplete results." If above 85%: "Context is critically full ({N}%). I must stop here. Run /compact and then re-run /cverify — verification will restart but reads from existing artifacts."

Token Tracking

Log token usage following the shared constraints (_shared/constraints.md). Skill-specific values:

skill: "cverify"
phase: "verification"
agent_role: "verification-agent"

Background Tasks

Run mutation testing in the background while doing rule coverage analysis, prohibition checks, and antipattern matching
Run coverage report in the background while doing drift detection
Run linter checks in the background while analyzing architecture compliance

Code Analysis (MCP Integration)

If mcp.serena is true in workflow-config.json, use Serena MCP for symbol-level code analysis during verification. Serena enables a traced coverage matrix — use find_referencing_symbols to trace rule to test to implementation to entry point, producing a Serena traced coverage matrix that is more precise than grep-based tracing. When Serena is available, augment the Rule Coverage table with a "Trace" column showing the symbol chain: rule_id -> test_fn -> impl_fn -> entry_point. If a link in the chain cannot be traced, mark it "?".

Use find_symbol instead of grepping for function/type names
Use find_referencing_symbols to trace callers and dependencies
Use get_symbols_overview for structural overview of a module
Use replace_symbol_body for precise edits (not used in this skill — verification is read-only)
Use search_for_pattern for regex searches with symbol context

Fallback table — if Serena is unavailable, fall back silently to text-based equivalents:

Serena Operation	Fallback
`find_symbol`	Grep for function/type name
`find_referencing_symbols`	Grep for symbol name across source files
`get_symbols_overview`	Read directory + read index files
`replace_symbol_body`	Edit tool
`search_for_pattern`	Grep tool

Autonomous Defaults

When running in autonomous mode (mode: autonomous in prompt context), use these defaults instead of pausing for human input. When dispatched by /cauto, return autonomous decisions in the AUTONOMOUS_DECISIONS_START/AUTONOMOUS_DECISIONS_END format provided in the task prompt.

Deferred escalation (R-011): This skill has context: fork and cannot receive human follow-up input. When an escalate: always decision point is reached in autonomous mode, the default is applied and the decision is returned with escalation_deferred: true and original_escalation_reason for human review at pipeline conclusion.

AD-001: Verification scope — verify all spec rules (default). Rationale: partial verification leaves gaps that downstream skills assume are covered.
AD-002: Coverage assessment — strict assessment against spec (default). Rationale: lenient assessment masks weak tests that pass despite rule violations.
AD-003: Rule bypass approval — escalate: always. Default if deferred: flag as uncovered, do not bypass. Rationale: bypassing verification rules weakens the safety net.

If Something Goes Wrong

Skill interrupted: Re-run the skill. It reads the current state and resumes where possible.
Rate limit hit: Wait 2-3 minutes and re-run. Workflow state persists between sessions.
Wrong output: This skill doesn't modify workflow state until the final advance step. Re-run from scratch safely.
Stuck in a phase: Run /cstatus to see where you are. Use workflow-advance.sh override "reason" if the gate is blocking legitimate work.

Constraints

Write the verification report file. /cpostmortem and /cupdate-arch depend on it.
Write drift debt entries when drift is found. /cspec reads these for future features.
Do NOT skip the rule coverage check. Every rule must be accounted for.
Do NOT approve a feature with uncovered rules. Uncovered rules are BLOCKING.
Be specific about weak tests. "Weak" means: the test would still pass if the rule were violated.