name: evaluate-findings description: Use when validating external feedback, review comments, AI findings, dead-code reports, or suspected false positives.
Evaluate Findings
Confidence-based framework for evaluating external feedback. Spawn a Devil's Advocate agent to challenge non-trivial claims. Triage and classify findings — do not apply fixes.
Responsibility
Owns: Classifying findings by confidence, verifying claims against actual code, spawning adversarial verification. Does NOT own: Applying fixes (returns classified results for the caller to act on).
Step 1: Assess Each Finding
For each finding:
- Read the referenced code at the mentioned location — include the full function or logical block, not just the flagged line
- Verify the claim against the actual code — does the issue genuinely exist?
- Assign confidence:
| Level | Criteria | Verdict |
|---|---|---|
| High (>80%) | Clear bug, missing check, obvious improvement, style violation matching project conventions | Accept |
| Medium (50-80%) | Likely valid but involves judgment calls or unclear project intent | Accept with caveats |
| Low (<50%) | Subjective preference, requires domain knowledge, might break things, reviewer may be wrong | Skip |
Step 2: Devil's Advocate
Spawn Condition
Spawn when there are 3 or more findings scored Medium or higher that involve non-trivial claims — API behavior, correctness arguments, performance assertions, or anything not verifiable by reading the code alone.
Skip when all findings are clear-cut (typos, missing nil checks, style issues) or total count is 1-2 trivial items.
Agent Instructions
Launch a single agent. Provide the challenge-worthy findings with file locations, claims, and initial verdicts. Instruct it to try to prove each finding wrong, or confirm it with evidence.
Research approach by claim type:
| Claim Type | Approach |
|---|---|
| API deprecated/changed | Check official API documentation via xcdocs skill, or WebSearch |
| Method doesn't exist / wrong signature | Grep codebase, check Apple docs |
| Code causes specific bug | Write minimal reproduction or trace the code path |
| Best practice claim | Check Apple documentation or WWDC sessions |
| persistence/persistence claim | Check the project persistence or migration policy and migration docs |
Budget: max 2 research actions per finding.
Agent Verdicts
Per finding:
- Confirmed — found evidence supporting the claim (with source)
- Disputed — found counter-evidence (with source and explanation)
- Inconclusive — no definitive evidence either way
Step 3: Reconciliation
Merge agent results with initial assessment:
- Confirmed: verdict stands, confidence may increase
- Disputed: if originally Accepted → downgrade to Skip or flag with both perspectives. Never silently override — show the disagreement.
- Inconclusive: verdict stands, note the uncertainty
Findings not investigated by the agent keep their original assessment.
Step 4: Format Output
| File | Issue | Confidence | Verdict | Investigated |
|---|
Where Investigated shows:
- (empty) — not investigated
- Confirmed (source) — supporting evidence found
- Disputed: [reason] — counter-evidence found
For disputed findings, add a callout below the table showing both perspectives.
Rules
- Never auto-dismiss findings about security defaults, permission escalation, or fail-open behavior — even if the plan specifies different behavior. Plans can have incorrect security assumptions.
- If a finding references code that no longer exists, skip it and note that.
- If two findings conflict, skip both and document the conflict.
- For each finding, clarify whether the issue was introduced by the current changeset or is pre-existing.
- Pre-existing issues in earlier commits on the same feature branch are in-scope — the entire branch is one unit of work.
- The caller determines what to do with the evaluated findings. This skill only triages.