xpowers-review-implementation - SKILL.md Agent Skill

name: xpowers-review-implementation description: Use after xpowers:executing-plans completes all tasks - verifies implementation against bd spec, all success criteria met, anti-patterns avoided

Review completed implementation against bd epic to catch gaps before claiming completion; spec is contract, implementation must fulfill contract completely. LOW FREEDOM - Follow the 4-step review process exactly. Review with Google Fellow-level scrutiny. Never skip automated checks, quality gates, or code reading. No approval without evidence for every criterion. | Step | Action | Deliverable | |------|--------|-------------| | 1 | Load bd epic + all tasks | TodoWrite with tasks to review | | 2 | Review each task (automated checks, quality gates, read code, **audit tests**, verify criteria) | Findings per task | | 3 | Report findings (approved / gaps found) | Review decision | | 4 | Gate: If approved → finishing-a-development-branch, If gaps → STOP | Next action |

Review Perspective: Google Fellow-level SRE with 20+ years experience reviewing junior engineer code.

Test Quality Gate: Every new test must catch a real bug. Tautological tests (pass by definition, test mocks, verify compiler-checked facts) = GAPS FOUND.

- xpowers:executing-plans completed all tasks - Before claiming work is complete - Before xpowers:finishing-a-development-branch - Want to verify implementation matches spec

Don't use for:

Mid-implementation (use xpowers:executing-plans)
Before all tasks done
Code reviews of external PRs (this is self-review)

## Step 1: Load Epic Specification

Announce: "I'm using xpowers:review-implementation to verify implementation matches spec. Reviewing with Google Fellow-level scrutiny."

Get epic and tasks:

tm show bd-1          # Epic specification
tm dep tree bd-1      # Task tree
tm list --parent bd-1 # All tasks

Create TodoWrite tracker:

TodoWrite todos:
- Review bd-2: Task Name
- Review bd-3: Task Name
- Review bd-4: Task Name
- Compile findings and make decision

Step 2: Review Each Task

For each task:

A. Read Task Specification

tm show bd-3

Extract:

Goal (what problem solved?)
Success criteria (how verify done?)
Implementation checklist (files/functions/tests)
Key considerations (edge cases)
Anti-patterns (prohibited patterns)

B. Run Automated Code Completeness Checks

# TODOs/FIXMEs without issue numbers
rg -i "todo|fixme" src/ tests/ || echo "✅ None"

# Stub implementations
rg "unimplemented!|todo!|unreachable!|panic!\(\"not implemented" src/ || echo "✅ None"

# Unsafe patterns in production
rg "\.unwrap\(\)|\.expect\(" src/ | grep -v "/tests/" || echo "✅ None"

# Ignored/skipped tests
rg "#\[ignore\]|#\[skip\]|\.skip\(\)" tests/ src/ || echo "✅ None"

C. Run Quality Gates (via test-runner agent)

IMPORTANT: Use xpowers:test-runner agent to avoid context pollution.

Dispatch xpowers:test-runner: "Run: cargo test"
Dispatch xpowers:test-runner: "Run: cargo fmt --check"
Dispatch xpowers:test-runner: "Run: cargo clippy -- -D warnings"
Dispatch xpowers:test-runner: "Run: .git/hooks/pre-commit"

D. Read Implementation Files

CRITICAL: READ actual files, not just git diff.

# See changes
git diff main...HEAD -- src/auth/jwt.ts

# THEN READ FULL FILE
Read tool: src/auth/jwt.ts

While reading, check:

✅ Code implements checklist items (not stubs)
✅ Error handling uses proper patterns (Result, try/catch)
✅ Edge cases from "Key Considerations" handled
✅ Code is clear and maintainable
✅ No anti-patterns present

E. Code Quality Review (Google Fellow Perspective)

Assume code written by junior engineer. Apply production-grade scrutiny.

Error Handling:

Proper use of Result/Option or try/catch?
Error messages helpful for production debugging?
No unwrap/expect in production?
Errors propagate with context?
Failure modes graceful?

Safety:

No unsafe blocks without justification?
Proper bounds checking?
No potential panics?
No data races?
No SQL injection, XSS vulnerabilities?

Clarity:

Would junior understand in 6 months?
Single responsibility per function?
Descriptive variable names?
Complex logic explained?
No clever tricks - obvious and boring?

Testing (CRITICAL - Apply strict scrutiny):

Edge cases covered (empty, max, Unicode)?
Tests catch real bugs, not just inflate coverage?
Test names describe specific bug prevented?
Tests test behavior, not implementation?
Failure scenarios tested?
No tautological tests (see Test Quality Audit below)?

Production Readiness:

Comfortable deploying to production?
Could cause outage or data loss?
Performance acceptable under load?
Logging sufficient for debugging?

E2. Test Quality Audit (Mandatory for All New Tests)

CRITICAL: Review every new/modified test for meaningfulness. Tautological tests are WORSE than no tests - they give false confidence.

For each test, ask:

What bug would this catch? → If you can't name a specific failure mode, test is pointless
Could production code break while this test passes? → If yes, test is too weak
Does this test a real user scenario? → Or just implementation details?
Is the assertion meaningful? → expect(result != nil) is weaker than expect(result == expectedValue)

Red flags (REJECT implementation until fixed):

❌ Tests that only verify syntax/existence ("enum has cases", "struct has fields")
❌ Tautological tests (pass by definition: expect(builder.build() != nil) when build() can't return nil)
❌ Tests that duplicate implementation (testing 1+1==2 by asserting 1+1==2)
❌ Tests without meaningful assertions (call code but don't verify outcomes matter)
❌ Tests that verify mock behavior instead of production code
❌ Codable/Equatable round-trip tests with only happy path data
❌ Generic test names ("test_basic", "test_it_works", "test_model")

Examples of meaningless tests to reject:

// ❌ REJECT: Tautological - compiler ensures enum has cases
func testEnumHasCases() {
    _ = MyEnum.caseOne  // This proves nothing
    _ = MyEnum.caseTwo
}

// ❌ REJECT: Tautological - build() returns non-optional, can't be nil
func testBuilderReturnsValue() {
    let result = Builder().build()
    #expect(result != nil)  // Always passes by type system
}

// ❌ REJECT: Tests mock, not production code
func testServiceCallsAPI() {
    let mock = MockAPI()
    let service = Service(api: mock)
    service.fetchData()
    #expect(mock.fetchCalled)  // Tests mock behavior, not real logic
}

// ❌ REJECT: Happy path only, no edge cases
func testCodable() {
    let original = User(name: "John", age: 30)
    let data = try! encoder.encode(original)
    let decoded = try! decoder.decode(User.self, from: data)
    #expect(decoded == original)  // What about empty name? Max age? Unicode?
}

Examples of meaningful tests to approve:

// ✅ APPROVE: Catches missing validation bug
func testEmptyPayloadReturnsValidationError() {
    let result = validator.validate(payload: "")
    #expect(result == .error(.emptyPayload))
}

// ✅ APPROVE: Catches race condition bug
func testConcurrentWritesDontCorruptData() {
    let store = ThreadSafeStore()
    DispatchQueue.concurrentPerform(iterations: 1000) { i in
        store.write(key: "k\(i)", value: i)
    }
    #expect(store.count == 1000)  // Would fail if race condition exists
}

// ✅ APPROVE: Catches error handling bug
func testMalformedJSONReturns400Not500() {
    let response = api.parse(json: "{invalid")
    #expect(response.status == 400)  // Not 500 which would indicate unhandled exception
}

// ✅ APPROVE: Catches encoding bug with edge case
func testUnicodeNamePreservedAfterRoundtrip() {
    let original = User(name: "日本語テスト 🎉")
    let decoded = roundtrip(original)
    #expect(decoded.name == original.name)
}

Audit process:

# Find all new/modified test files
git diff main...HEAD --name-only | grep -E "(test|spec)"

# Read each test file
Read tool: tests/new_feature_test.swift

# For EACH test function, document:
# - Test name
# - What bug it catches (or "TAUTOLOGICAL" if none)
# - Verdict: ✅ Keep / ⚠️ Strengthen / ❌ Remove/Replace

If tautological tests found:

## Test Quality Audit: GAPS FOUND ❌

### Tautological/Meaningless Tests
| Test | Problem | Action |
|------|---------|--------|
| testEnumHasCases | Compiler already ensures this | ❌ Remove |
| testBuilderReturns | Non-optional return, can't be nil | ❌ Remove |
| testCodable | Happy path only, no edge cases | ⚠️ Add: empty, unicode, max values |
| testServiceCalls | Tests mock, not production | ❌ Replace with integration test |

**Cannot approve until tests are meaningful.**

F. Verify Success Criteria with Evidence

For EACH criterion in bd task:

Run verification command
Check actual output
Don't assume - verify with evidence
Use xpowers:test-runner for tests/lints

Example:

Criterion: "All tests passing"
Command: cargo test
Evidence: "127 tests passed, 0 failures"
Result: ✅ Met

Criterion: "No unwrap in production"
Command: rg "\.unwrap\(\)" src/
Evidence: "No matches"
Result: ✅ Met

G. Check Anti-Patterns

Search for each prohibited pattern from bd task:

# Example anti-patterns from task
rg "\.unwrap\(\)" src/  # If task prohibits unwrap
rg "TODO" src/          # If task prohibits untracked TODOs
rg "\.skip\(\)" tests/  # If task prohibits skipped tests

H. Verify Key Considerations

Read code to confirm edge cases handled:

Empty input validation
Unicode handling
Concurrent access
Failure modes
Performance concerns

Example: Task says "Must handle empty payload" → Find validation code for empty payload.

I. Record Findings

### Task: bd-3 - Implement JWT authentication

#### Automated Checks
- TODOs: ✅ None
- Stubs: ✅ None
- Unsafe patterns: ❌ Found `.unwrap()` at src/auth/jwt.ts:45
- Ignored tests: ✅ None

#### Quality Gates
- Tests: ✅ Pass (127 tests)
- Formatting: ✅ Pass
- Linting: ❌ 3 warnings
- Pre-commit: ❌ Fails due to linting

#### Files Reviewed
- src/auth/jwt.ts: ⚠️ Contains `.unwrap()` at line 45
- tests/auth/jwt_test.rs: ✅ Complete

#### Code Quality
- Error Handling: ⚠️ Uses unwrap instead of proper error propagation
- Safety: ✅ Good
- Clarity: ✅ Good
- Testing: See Test Quality Audit below

#### Test Quality Audit (New/Modified Tests)
| Test | Bug It Catches | Verdict |
|------|----------------|---------|
| test_valid_token_accepted | Missing validation | ✅ Keep |
| test_expired_token_rejected | Expiration bypass | ✅ Keep |
| test_jwt_struct_exists | Nothing (tautological) | ❌ Remove |
| test_encode_decode | Encoding bug (but happy path only) | ⚠️ Add edge cases |

**Tautological tests found:** 1 (test_jwt_struct_exists)
**Weak tests found:** 1 (test_encode_decode needs edge cases)

#### Success Criteria
1. "All tests pass": ✅ Met - Evidence: 127 tests passed
2. "Pre-commit passes": ❌ Not met - Evidence: clippy warnings
3. "No unwrap in production": ❌ Not met - Evidence: Found at jwt.ts:45

#### Anti-Patterns
- "NO unwrap in production": ❌ Violated at src/auth/jwt.ts:45

#### Issues
**Critical:**
1. unwrap() at jwt.ts:45 - violates anti-pattern, must use proper error handling
2. Tautological test: test_jwt_struct_exists must be removed

**Important:**
3. 3 clippy warnings block pre-commit hook
4. test_encode_decode needs edge cases (empty, unicode, max length)

J. Mark Task Reviewed (TodoWrite)

Step 3: Report Findings

After reviewing ALL tasks:

If NO gaps:

## Implementation Review: APPROVED ✅

Reviewed bd-1 (OAuth Authentication) against implementation.

### Tasks Reviewed
- bd-2: Configure OAuth provider ✅
- bd-3: Implement token exchange ✅
- bd-4: Add refresh logic ✅

### Verification Summary
- All success criteria verified
- No anti-patterns detected
- All key considerations addressed
- All files implemented per spec

### Evidence
- Tests: 127 passed, 0 failures (2.3s)
- Linting: No warnings
- Pre-commit: Pass
- Code review: Production-ready

Ready to proceed to xpowers:finishing-a-development-branch.

If gaps found:

## Implementation Review: GAPS FOUND ❌

Reviewed bd-1 (OAuth Authentication) against implementation.

### Tasks with Gaps

#### bd-3: Implement token exchange
**Gaps:**
- ❌ Success criterion not met: "Pre-commit hooks pass"
  - Evidence: cargo clippy shows 3 warnings
- ❌ Anti-pattern violation: Found `.unwrap()` at src/auth/jwt.ts:45
- ⚠️ Key consideration not addressed: "Empty payload validation"
  - No check for empty payload in generateToken()

#### bd-4: Add refresh logic
**Gaps:**
- ❌ Success criterion not met: "All tests passing"
  - Evidence: test_verify_expired_token failing

### Cannot Proceed
Implementation does not match spec. Fix gaps before completing.

Step 4: Gate Decision

If APPROVED:

Announce: "I'm using xpowers:finishing-a-development-branch to complete this work."

Use Skill tool: xpowers:finishing-a-development-branch

If GAPS FOUND:

STOP. Do not proceed to finishing-a-development-branch.
Fix gaps or discuss with partner.
Re-run review after fixes.

Developer only checks git diff, doesn't read actual files


# Review process
git diff main...HEAD  # Shows changes

Developer sees:

function generateToken(payload) {
return jwt.sign(payload, secret);
}

Approves based on diff
"Looks good, token generation implemented ✅"
Misses: Full context shows no validation

function generateToken(payload) { // No validation of payload! // No check for empty payload (key consideration) // No error handling if jwt.sign fails return jwt.sign(payload, secret); }

- Git diff shows additions, not full context - Missed that empty payload not validated (key consideration) - Missed that error handling missing (quality issue) - False approval - gaps exist but not caught - Will fail in production when empty payload passed **Correct review process:**

# See changes
git diff main...HEAD -- src/auth/jwt.ts

# THEN READ FULL FILE
Read tool: src/auth/jwt.ts

Reading full file reveals:

function generateToken(payload) {
  // Missing: empty payload check (key consideration from bd task)
  // Missing: error handling for jwt.sign failure
  return jwt.sign(payload, secret);
}

Record in findings:

⚠️ Key consideration not addressed: "Empty payload validation"
- No check for empty payload in generateToken()
- Code at src/auth/jwt.ts:15-17

⚠️ Error handling: jwt.sign can throw, not handled

What you gain:

Caught gaps that git diff missed
Full context reveals missing validation
Quality issues identified before production
Spec compliance verified, not assumed

Developer assumes tests passing means done


# Run tests
cargo test
# Output: 127 tests passed

Developer concludes
"Tests pass, implementation complete ✅"
Proceeds to finishing-a-development-branch
Misses:


bd task has 5 success criteria
Only checked 1 (tests pass)
Anti-pattern: unwrap() present (prohibited)
Key consideration: Unicode handling not tested

Linter has warnings (blocks pre-commit)

- Tests passing ≠ spec compliance - Didn't verify all success criteria - Didn't check anti-patterns - Didn't verify key considerations - Pre-commit will fail (blocks merge) - Ships code violating anti-patterns **Correct review checks ALL criteria:**

bd task has 5 success criteria:
1. "All tests pass" ✅ - Evidence: 127 passed
2. "Pre-commit passes" ❌ - Evidence: clippy warns (3 warnings)
3. "No unwrap in production" ❌ - Evidence: Found at jwt.ts:45
4. "Unicode handling tested" ⚠️ - Need to verify test exists
5. "Rate limiting implemented" ⚠️ - Need to check code

Result: 1/5 criteria verified met. GAPS EXIST.

Run additional checks:

# Check criterion 2
cargo clippy
# 3 warnings found ❌

# Check criterion 3
rg "\.unwrap\(\)" src/
# src/auth/jwt.ts:45 ❌

# Check criterion 4
rg "unicode" tests/
# No matches ⚠️ Need to verify

Decision: GAPS FOUND, cannot proceed

What you gain:

Verified ALL criteria, not just tests
Caught anti-pattern violations
Caught pre-commit blockers
Prevented shipping non-compliant code
Spec contract honored completely

Developer rationalizes skipping rigor for "simple" task


bd task: "Add logging to error paths"

Developer thinks: "Simple task, just added console.log"
Skips:

Automated checks (assumes no issues)
Code quality review (seems obvious)
Full success criteria verification

Approves quickly:
"Logging added ✅"
Misses:


console.log used instead of proper logger (anti-pattern)
Only added to 2 of 5 error paths (incomplete)
No test verifying logs actually output (criterion)

Logs contain sensitive data (security issue)

- "Simple" tasks have hidden complexity - Skipped rigor catches exactly these issues - Incomplete implementation (2/5 paths) - Security vulnerability shipped - Anti-pattern not caught - Failed success criterion (test logs) **Follow full review process:**

# Automated checks
rg "console\.log" src/
# Found at error-handler.ts:12, 15 ⚠️

# Read bd task
tm show bd-5

# Success criteria:
# 1. "All error paths logged"
# 2. "No sensitive data in logs"
# 3. "Test verifies log output"

# Check criterion 1
grep -n "throw new Error" src/
# 5 locations found
# Only 2 have logging ❌ Incomplete

# Check criterion 2
Read tool: src/error-handler.ts
# Logs contain password field ❌ Security issue

# Check criterion 3
rg "test.*log" tests/
# No matches ❌ Test missing

Decision: GAPS FOUND

Incomplete (3/5 error paths missing logs)
Security issue (logs password)
Anti-pattern (console.log instead of logger)
Missing test

What you gain:

"Simple" task revealed multiple gaps
Security vulnerability caught pre-production
Rigor prevents incomplete work shipping
All criteria must be met, no exceptions

Developer approves implementation with high test coverage but tautological tests


# Test results show good coverage
cargo test
# 45 tests passed ✅
# Coverage: 92% ✅

Developer approves based on numbers
"Tests pass with 92% coverage, implementation complete ✅"
Proceeds to finishing-a-development-branch
Later in production:
- Validation bypassed because test only checked "validator exists"
- Race condition because test only checked "lock was acquired"
- Encoding corruption because test only checked "encode != nil"

- High coverage doesn't mean meaningful tests - Tests verified existence/syntax, not behavior - Tautological tests passed by definition: - `expect(validator != nil)` - always passes, doesn't test validation logic - `expect(lock.acquire())` - tests mock, not thread safety - `expect(encoded.count > 0)` - tests non-empty, not correctness - Production bugs occurred despite "good" test coverage - Coverage metrics were gamed with meaningless tests **Audit each test for meaningfulness:**

# Find new tests
git diff main...HEAD --name-only | grep test

# Read and audit each test
Read tool: tests/validator_test.swift

For each test, document:

#### Test Quality Audit

| Test | Assertion | Bug Caught? | Verdict |
|------|-----------|-------------|---------|
| testValidatorExists | `!= nil` | ❌ None (compiler checks) | ❌ Remove |
| testValidInput | `isValid == true` | ⚠️ Happy path only | ⚠️ Add edge cases |
| testEmptyInputFails | `isValid == false` | ✅ Missing validation | ✅ Keep |
| testLockAcquired | mock.acquireCalled | ❌ Tests mock | ❌ Replace |
| testConcurrentAccess | count == expected | ✅ Race condition | ✅ Keep |
| testEncodeNotNil | `!= nil` | ❌ Type guarantees this | ❌ Remove |
| testUnicodeRoundtrip | decoded == original | ✅ Encoding corruption | ✅ Keep |

**Tautological tests:** 3 (must remove)
**Weak tests:** 1 (must strengthen)
**Meaningful tests:** 3 (keep)

Decision: GAPS FOUND ❌

## Test Quality Audit: GAPS FOUND

### Tautological Tests (Must Remove)
- testValidatorExists: Compiler ensures non-nil, test proves nothing
- testLockAcquired: Tests mock behavior, not actual thread safety
- testEncodeNotNil: Return type is non-optional, can never be nil

### Weak Tests (Must Strengthen)
- testValidInput: Only happy path, add:
  - testEmptyStringRejected
  - testMaxLengthRejected
  - testUnicodeNormalized

### Action Required
Remove 3 tautological tests, add 3 edge case tests, then re-review.

What you gain:

Real test quality, not coverage theater
Bugs caught before production
Tests that actually verify behavior
Confidence in test suite

## Rules That Have No Exceptions

Review every task → No skipping "simple" tasks
Run all automated checks → TODOs, stubs, unwrap, ignored tests
Read actual files with Read tool → Not just git diff
Verify every success criterion → With evidence, not assumptions
Check all anti-patterns → Search for prohibited patterns
Apply Google Fellow scrutiny → Production-grade code review
Audit all new tests for meaningfulness → Tautological tests = gaps, not coverage
If gaps found → STOP → Don't proceed to finishing-a-development-branch

Common Excuses

All of these mean: STOP. Follow full review process.

"Tests pass, must be complete" (Tests ≠ spec, check all criteria)
"I implemented it, it's done" (Implementation ≠ compliance, verify)
"No time for thorough review" (Gaps later cost more than review now)
"Looks good to me" (Opinion ≠ evidence, run verifications)
"Small gaps don't matter" (Spec is contract, all criteria matter)
"Will fix in next PR" (This PR completes this epic, fix now)
"Can check diff instead of files" (Diff shows changes, not context)
"Automated checks cover it" (Checks + code review both required)
"Success criteria passing means done" (Also check anti-patterns, quality, edge cases)
"Tests exist, so testing is complete" (Tautological tests = false confidence)
"Coverage looks good" (Coverage can be gamed with meaningless tests)
"Tests are boilerplate, don't need review" (Every test must catch a real bug)
"It's just a simple existence check" (Compiler already checks existence)

Before approving implementation:

Per task:

Read bd task specification completely
Ran all automated checks (TODOs, stubs, unwrap, ignored tests)
Ran all quality gates via test-runner agent (tests, format, lint, pre-commit)
Read actual implementation files with Read tool (not just diff)
Reviewed code quality with Google Fellow perspective
Audited all new tests for meaningfulness (not tautological)
Verified every success criterion with evidence
Checked every anti-pattern (searched for prohibited patterns)
Verified every key consideration addressed in code

Overall:

Reviewed ALL tasks (no exceptions)
TodoWrite tracker shows all tasks reviewed
Compiled findings (approved or gaps)
If approved: all criteria met for all tasks
If gaps: documented exactly what missing

Can't check all boxes? Return to Step 2 and complete review.

**This skill is called by:** - xpowers:executing-plans (Step 5, after all tasks executed)

This skill calls:

xpowers:finishing-a-development-branch (if approved)
xpowers:test-runner agent (for quality gates)

This skill uses:

xpowers:verification-before-completion principles (evidence before claims)

Call chain:

xpowers:executing-plans → xpowers:review-implementation → xpowers:finishing-a-development-branch
                         ↓
                   (if gaps: STOP)

CRITICAL: Use bd commands (tm show, tm list, tm dep tree), never read .beads/issues.jsonl directly.

**Detailed guides:** - [Code quality standards by language](resources/quality-standards.md) - [Common anti-patterns to check](resources/anti-patterns-reference.md) - [Production readiness checklist](resources/production-checklist.md)

When stuck:

Unsure if gap critical → If violates criterion, it's a gap
Criteria ambiguous → Ask user for clarification before approving
Anti-pattern unclear → Search for it, document if found
Quality concern → Document as gap, don't rationalize away