name: test-reviewing description: Reviews generated tests by running them, analyzing failures, and validating London School patterns. Use when reviewing tests, checking test quality, or before spec verification. disable-model-invocation: false
Test Reviewing
Reviews generated tests to ensure quality and correctness before proceeding to spec verification. Acts as the quality gate between test generation and coverage verification.
Workflow Position
Step 6: Test Generation (/test-generate)
│
▼
Step 6.5: Test Review (/test-reviewing) ← THIS SKILL
│
├── All tests pass → Step 8: /spec-verify
└── Failures → Step 7: /test-fixing
│
└── Loop back to Step 6.5
Workflow
Step 1: Accept Input
Usage patterns:
# Review tests for a specific class
/test-reviewing LiveViewRepository
# Review with verbose output
/test-reviewing AuthViewModel --verbose
# Review multiple classes (will use subagents)
/test-reviewing LiveViewRepository AuthViewModel SecureRepository
Step 1.5: Evaluate Scope and Parallelize
When multiple targets are specified or a single target has many test groups (>50 tests), split the work across subagents for efficiency.
Decision criteria:
| Condition | Action |
|---|---|
| Single target, <50 tests | Process directly |
| Single target, ≥50 tests | Split by test group, use subagents |
| Multiple targets (2-5) | One subagent per target, parallel execution |
| Many targets (>5) | Batch into groups of 3, sequential batches |
Subagent invocation:
IMPORTANT: When invoking subagents, you MUST instruct them to use the /test-reviewing skill explicitly.
Use Task tool with subagent_type="general-purpose" for each target:
Task(
description="Review {ClassName} tests",
prompt="Use the /test-reviewing skill to review tests for {ClassName}.
Invoke the skill with: /test-reviewing {ClassName}
Follow the skill workflow completely and return the review report.",
subagent_type="general-purpose"
)
The subagent must invoke /test-reviewing - do NOT just describe the workflow in the prompt.
Parallel execution example:
When reviewing LiveViewRepository, AuthViewModel, and SecureRepository:
- Launch 3 subagents in parallel (single message with multiple Task tool calls)
- Each subagent runs Steps 2-7 independently
- Main agent aggregates results into combined report
Result aggregation:
## Combined Test Review Report
### Individual Results
| Class | Tests | Passed | Failed | Status |
|-------|-------|--------|--------|--------|
| LiveViewRepository | 135 | 130 | 5 | ⚠️ Issues |
| AuthViewModel | 24 | 24 | 0 | ✓ Pass |
| SecureRepository | 18 | 18 | 0 | ✓ Pass |
### Summary
- Total: 177 tests
- Passed: 172 (97.2%)
- Failed: 5 (2.8%)
### Next Actions
- LiveViewRepository: Run `/test-fixing` to resolve 5 issues
- AuthViewModel: Run `/spec-verify AuthViewModel`
- SecureRepository: Run `/spec-verify SecureRepository`
Step 2: Locate Test File
Based on class type, find test file:
| Class Type | Test Location |
|---|---|
| ViewModel | test/unit_tests/ui/{feature}/{class}_test.dart |
| Repository | test/unit_tests/domain/repository/{class}_test.dart |
| Service | test/unit_tests/service/{class}_test.dart |
If test file not found, report error and suggest running /test-generate first.
Step 3: Run Tests
Execute tests and capture output:
flutter test <test_file> --no-pub 2>&1
Step 4: Analyze Test Structure
Verify London School patterns:
| Check | Criteria | Pass |
|---|---|---|
| Mock Usage | All dependencies are mocked | ✓/✗ |
| No Real I/O | No network, file, or database calls | ✓/✗ |
| Isolation | Each test is independent | ✓/✗ |
| Naming | Follows method_GivenScenario_ShouldResult |
✓/✗ |
| Spec Tags | Tests have @Tags(['SPEC-XXX']) |
✓/✗ |
Step 5: Validate Mock Configuration
Check mock setup quality:
| Check | Criteria |
|---|---|
| Complete Stubs | All used methods are stubbed |
| Correct Types | Return types match implementation |
| Named Parameters | Uses anyNamed() correctly |
| Dummy Values | provideDummy for complex types |
Step 6: Categorize Failures
If tests fail, categorize each failure:
| Category | Code | Description | Next Action |
|---|---|---|---|
| Structural | S | Test structure issues (missing setup, wrong patterns) | /test-fixing |
| Mock | M | Mock configuration problems | /test-fixing |
| Assertion | A | Assertion failures (expected vs actual) | /test-fixing |
| Runtime | R | Runtime errors (null, type, import) | /test-fixing |
Step 7: Generate Review Report
Output the following format:
## Test Review Report: {ClassName}
### Summary
| Metric | Value |
|--------|-------|
| Test File | `test/unit_tests/.../{class}_test.dart` |
| Total Tests | {count} |
| Passed | {count} |
| Failed | {count} |
| Skipped | {count} |
### Test Execution
{pass/fail status for each test group}
### London School Compliance
| Check | Status |
|-------|--------|
| Mock Usage | ✓ All dependencies mocked |
| No Real I/O | ✓ No external calls |
| Isolation | ✓ Tests are independent |
| Naming | ✓ Follows convention |
| Spec Tags | ✓ All tests tagged |
### Issues Found
{If any issues, list with category codes}
| # | Category | Test | Issue |
|---|----------|------|-------|
| 1 | M | fetchAccount_WhenSuccess_ShouldReturn | Wrong mock return type |
| 2 | A | delete_WhenNotFound_ShouldThrow | Expected exception not thrown |
### Next Action
{One of the following:}
- **All tests pass**: Run `/spec-verify {ClassName}` to check coverage
- **Issues found**: Run `/test-fixing` to resolve {count} issues
Issue Categories Reference
| Code | Category | Common Causes |
|---|---|---|
| S | Structural | Missing setUpAll, wrong test grouping, no tearDown |
| M | Mock | Wrong method stubbed, incorrect return type, missing stub |
| A | Assertion | Wrong expected value, matcher issue, async timing |
| R | Runtime | Null pointer, type cast error, missing import |
Integration Points
From /test-generate
- Receives: Newly generated test scaffolds
- Validates: Structure and mock configuration
To /test-fixing
- Sends: Categorized failure list
- Expects: Fixed tests for re-review
To /spec-verify
- Sends: Passing test file
- Enables: Coverage verification
Step 8: Cross-Agent Consensus (Codex Review)
After completing your review, request a parallel review from Codex CLI to validate findings and build consensus.
8.1 Invoke Codex Review
Run Codex review using a structured request, not shell string interpolation.
Do not run commands like codex exec "... {ClassName} ..." where placeholders are directly embedded in a shell string.
Instead:
- Validate dynamic inputs first (for example, class name with an allowlist such as
^[A-Za-z0-9_./-]+$). - Build a structured request object with separate fields.
- Invoke Codex via tool/subagent interface that accepts arguments as fields (not a single concatenated command string).
Example structured request:
{
"skill": "/test-reviewing",
"class_name": "<validated_class_name>",
"focus": [
"test execution results",
"London School compliance",
"mock configuration",
"issues found"
],
"output": "detailed review report"
}
8.2 Analyze Codex Response
Compare Codex's review with your own:
| Aspect | Your Review | Codex Review | Agreement |
|---|---|---|---|
| Test Results | X passed, Y failed | ... | ✓/✗ |
| London School | ... | ... | ✓/✗ |
| Issues Found | ... | ... | ✓/✗ |
| Next Action | ... | ... | ✓/✗ |
8.3 Resolve Disagreements
If there are disagreements:
- Identify the specific disagreement - What exactly differs?
- Present your reasoning - Why do you believe your assessment is correct?
- Request Codex clarification - Ask Codex to explain its reasoning
- Iterate if needed - Continue discussion until consensus is reached
Example dialogue:
{
"type": "consensus_followup",
"codex_finding": "<summary_from_codex>",
"reviewer_finding": "<summary_from_reviewer>",
"reasoning": "<concise_reasoning>",
"requested_action": [
"explain_codex_reasoning",
"or_acknowledge_correction"
]
}
Security note:
- Never interpolate
{Codex's finding},{Your finding}, or other dynamic text directly into shell commands. - If CLI execution is unavoidable, pass data via safe argument lists or a prebuilt data file and sanitize inputs before use.
8.4 Document Consensus
Once consensus is reached, document the agreed findings:
### Cross-Agent Consensus
| Reviewer | Agreement |
|----------|-----------|
| Claude Code | ✓ |
| Codex CLI | ✓ |
**Consensus Points:**
- Test execution: {agreed result}
- London School compliance: {agreed assessment}
- Issues: {agreed list}
- Next action: {agreed recommendation}
**Resolved Disagreements:**
- {topic}: Claude said X, Codex said Y, agreed on Z because {reason}
Checklist
- Test file located
- Tests executed
- Structure analyzed (London School)
- Mock configuration validated
- Failures categorized (S/M/A/R)
- Review report generated
- Codex review requested
- Disagreements resolved
- Consensus documented
- Next action recommended
References
- Test Plan:
docs/TEST_PLAN.md - Spec Files:
docs/specs/{feature}/{class}_spec.yaml - Test Fixing:
.claude/skills/test-fixing/SKILL.md - Spec Verify:
.claude/skills/spec-verifying/SKILL.md