name: bm.verification description: "Self-verification before completion: before claiming done, must list checklist, run through it, and check results. Use before claiming a task is complete, finished, ready, done, 完成, 搞定, 做完, 跑通, ready, done, finish, complete. Prevents false completions, self-deception, and submitting broken deliverables." source: opencrew version: "20260530.01"
Skill: Verification (Self-Verification Before Completion)
Before claiming "done", the agent must run through verification itself, list evidence, not just say "I finished it" in words.
Applicable to code, documentation, design, reports, configuration — any deliverable.
Trigger Timing
Mandatory trigger: Before the agent is about to say any of the following —
- "I've completed X"
- "X is fixed"
- "It's ready, give it a try"
- "Done, written to Y"
- "All requirements are implemented"
- "Done"
Do NOT trigger:
- User explicitly says "that's fine for now, I'll review it myself"
- Intermediate artifacts, not claiming completion
Three-Step Method
Step 1: List the Checklist
Based on this task's goals, list verifiable checklist items. Each must be:
- Objective: Can be answered with evidence yes/no, not subjective judgment
- Specific: Not "feature works", but "running command X returns 0"
- Actionable: You (the agent) can verify it on the spot
Checklist examples:
Writing code:
- File written to expected path (verify with
lsor reading the file) - Syntax correct (lint / typecheck passes)
- Key behavior matches spec (run tests or manual invocation)
- Haven't broken other files (check scope with git diff)
Writing documentation:
- Length / structure meets requirements
- Facts are from source material, not fabricated
- Links / file paths are all accessible
- User's key points are all covered
Fixing a bug:
- Reproduction steps no longer trigger the issue
- No regression introduced (related tests pass)
- Root cause was fixed, not symptoms
Configuration/Installation:
- Config file is valid format (json/yaml parsing passes)
- Service/tool can start
- Verified with a minimal test case
Step 2: Actually Run Through It
Actually execute, don't imagine. Common actions:
| Deliverable Type | Verification Action |
|---|---|
| Code | Run tests / lint / typecheck / actually invoke once |
| Script | Run once with sample input |
| Documentation | Read through once, check links and facts |
| Configuration | Have the target program load this config |
| Data/Report | Spot-check several entries |
Record actual evidence for each checklist item:
- Command exit codes and output snippets
- Actual file content snippets
- Screenshots / key log lines
Step 3: Present the Verification Report
Format presented to the user:
## Completion Report
**What was done**: One sentence describing what was done.
**Verification Results**:
- ✅ File written to ./output/report.md (138 lines)
- ✅ Markdown lint passed (0 errors)
- ✅ All links accessible (5/5)
- ⚠️ One item [needs confirmation]: X data field missing from original material, marked accordingly
**Evidence**:
\```
$ wc -l output/report.md
138 output/report.md
$ markdownlint output/report.md
(no output, passed)
\```
**Not Covered**:
- Didn't do Y because it's out of scope
**Suggested Next Steps**:
- You may want to check the ⚠️ item
Anti-Patterns (Absolutely Do Not Do These)
| Anti-Pattern | Why It's Wrong | Correct Approach |
|---|---|---|
| "Should be fine" / "Theoretically no issues" | Haven't actually run it | Must run it |
| Ran once without errors and assumed OK | Didn't check output; could be silent failure | Check actual output against expectations |
| Self-assessed as correct and reported done | Subjective, unreliable | Give objective evidence |
| Lowered verification standards during verification | Self-deception | Verification standard = task standard |
| Reported "all done" but skipped some items | Misleading to the user | Explicitly mark skipped items with reasons |
| Hid failed items | User discovers them later, worse | Report failed items with ❌ clearly, give fix suggestions |
Edge Cases
When verification fails:
- Don't stubbornly report "done"
- Report "partially done" or "not done", list failed items and your assessment (continue fixing / needs user decision)
When unable to verify (e.g., requires external environment):
- Explicitly say "I cannot locally verify X because Y"
- Give the user simple verification steps: "Please run
Zand check if the output is..."
When verification cost is too high (e.g., requires production deployment):
- Fall back: Reproduce an equivalent scenario on staging / locally to verify
- Clearly mark the parts that cannot be verified
Collaboration with Other Skills
- Task starting point used spec defined by bm.brainstorming → Verification = checking against spec's success criteria
- Errors occur → Switch to bm.systematic-troubleshooting to troubleshoot root cause
- User perspective review → bm.voice-of-user
File Locations
Verification evidence/logs → ./working/verification-{task}.log, attach to report when needed.
Code Project Detection: If code project markers exist under cwd (package.json, Cargo.toml, go.mod, pyproject.toml, setup.py, pom.xml, Gemfile, composer.json, or src/ + .git/), document-type final artifacts go under ./docs/ in corresponding subdirectories. Intermediate artifacts in ./working/ remain unchanged. User-specified paths take priority.