name: review-pr-strict description: Ruthlessly thorough PR alignment audit with strict grading AND full coverage — dispatches one Myrmidon swarm agent per audit dimension so every changed file, every linked issue, and every cited architecture document is examined, then grades from F up with concrete evidence required. allowed-tools: [Read, Bash, Grep, Glob, Agent, WebFetch]
/review-pr-strict
Performs an exhaustive completeness, correctness, and alignment audit of a single Pull Request against its linked issue(s) and the repository's architectural documentation. Uses STRICT grading and full coverage via a Myrmidon swarm — no sampling, no "spot-check the diff", no grade inflation.
Usage:
/review-pr-strict <PR_NUMBER>(optionally--repo OWNER/NAME) Run from the root directory of the target repository.Warning: STRICT audit. Grades start at F and must be earned. Most real PRs score C–D, not A–B.
vs. repo-analyze-strict-full: that variant audits the entire repository. This variant audits a single change set and its alignment with stated requirements + architecture docs. The grading philosophy, anti-inflation rules, and swarm dispatch pattern are the same — only the unit of analysis and the audit dimensions differ.
- The PR's linked issue(s) and their stated requirements / acceptance criteria
- The repository's architecture documentation (CLAUDE.md, ADRs, docs/architecture/, design docs)
- The actual current state of the codebase (ground truth, not aspirational docs)
You MUST read every changed file in full, every linked issue, and every cited architecture document — via per-dimension swarm agents. No sampling. No "I read the description and it sounds fine."
For each dimension defined below, assign a letter grade (A–F) with a percentage and evidence-based justification. Conclude with an overall summary, consolidated findings, a triage split (fix-now vs file-as-issue), and a final GO / CONDITIONAL / NO-GO verdict.
Grading philosophy: Default to failure. Every dimension starts at F. Every grade above F must be EARNED with concrete, file:line-cited evidence. An "A" means you actively looked for misalignment and could not find any.
A (93-100%) — Exemplary. Meets virtually every criterion with concrete evidence. RARE. A- (90-92%) — Near-exemplary. One or two small gaps. B+ (87-89%) — Very good. Strong with minor gaps. B (83-86%) — Good. Solid but with notable gaps. B- (80-82%) — Above average. Clear areas needing improvement. C+ (77-79%) — Acceptable. Multiple gaps that should be prioritized. C (73-76%) — Mediocre. Meets minimum expectations. C- (70-72%) — Below acceptable. Significant gaps. D+ (67-69%) — Poor. Multiple significant deficiencies. D (63-66%) — Very poor. Fundamental practices missing. D- (60-62%) — Near-failing. F (0-59%) — Failing. Misaligned, broken, or dangerous. N/A — Not applicable to this PR (must justify).
- DEFAULT IS F. Every dimension starts at F. Find concrete evidence to justify ANY upgrade.
- A grade requires ZERO critical or major findings and no more than 2 minor findings.
- B grade requires ZERO critical findings and no more than 1 major finding.
- "It looks done" is not sufficient. The diff must DEMONSTRABLY satisfy each criterion.
- Missing tests, missing docs updates, missing architecture-doc updates are MAJOR or CRITICAL — never nitpicks.
- Do NOT give credit for plans, TODOs, or follow-up issues. Grade what THIS change delivers.
- Do NOT round up.
- Do NOT trust the PR description, the issue body, or CLAUDE.md by themselves. Verify against actual source code and the actual diff.
- If the architecture documentation contradicts the code, the CODE is ground truth — and that contradiction is itself a finding.
- If you catch yourself wanting to give a B or higher, re-examine: did you actually read every changed file? Did you actually open every linked issue and ADR?
For each dimension, output:
- Grade (letter +/- modifier) and percentage
- "Evidence Reviewed" — specific files/issues/ADRs/diffs the dimension agent opened
- Strengths — must cite file:line, PR comment ID, or issue field
- Findings — CRITICAL / MAJOR / MINOR / NITPICK — must cite file:line or artifact reference
- Missing — criteria entirely absent from the change
- Principle references with concrete code examples
<criteria>
- Linked issue is referenced via `Closes #N` / `Fixes #N` / `Refs #N` in PR body or commit messages
- Every acceptance criterion in the issue body has a corresponding change or test
- No silent scope creep — every diff hunk maps to a stated requirement (YAGNI)
- No silent scope reduction — no acceptance criteria abandoned without explicit justification
- "Already-done" check: search for prior PRs/commits that may have already resolved this issue
- Issue's "Definition of Done" (if present) is fully satisfied
- PR title and description accurately reflect what changed (POLA)
</criteria>
<criteria>
- Change respects module boundaries declared in CLAUDE.md / docs/architecture/ / ADRs (MODULARITY)
- Dependency direction is consistent with documented layering (SOLID/DIP)
- No new architectural pattern is introduced without an ADR or design doc
- If a documented pattern exists for this kind of change, it is followed (POLA, DRY)
- If the change supersedes or invalidates an existing ADR, the ADR is updated or marked superseded
- New components are placed in the directory structure documented in CLAUDE.md
- Naming conventions match surrounding code and documented standards (POLA)
- Documentation is updated to match the change — CLAUDE.md, ADRs, docs/architecture/, README sections
- If architecture doc claims diverge from actual code (ground truth), this is flagged as a finding
</criteria>
<criteria>
- Readability and naming consistent with surrounding code (POLA)
- Functions/methods do one thing (SOLID/SRP, KISS)
- No copy-paste from elsewhere in the repo that should have been refactored (DRY)
- Type safety: type hints / generics / null safety appropriate for the language
- Error handling: no swallowed exceptions, informative messages, no leaked internals
- No dead code, commented-out blocks, unrelated formatting churn
- No magic numbers, hardcoded URLs, or hardcoded credentials introduced
- Logging is structured and at appropriate level; no sensitive data in logs
- Guard clauses / early returns over deep nesting (KISS)
- Imports are clean: no unused, no wildcard, ordered per project convention
</criteria>
<criteria>
- Every new/changed code path has at least one test asserting behavior (TDD)
- Tests assert behavior, not implementation details
- Edge cases covered: null/empty inputs, boundaries, error paths, concurrency where relevant
- Tests are isolated — no shared mutable state, no order dependencies
- No skipped/xfail tests added without a tracking issue
- Test file location mirrors source structure
- Snapshot tests are justified, not used as a lazy substitute
- Tests run and pass locally (verify via CI status if available)
- For bug fixes: a regression test that fails before the fix and passes after
</criteria>
<criteria>
- No secrets, API keys, tokens, or PII added to source, tests, or fixtures
- All new external inputs validated and sanitized
- New auth/authz code follows least privilege; no auth bypass paths introduced
- OWASP Top 10 considerations addressed for any new endpoint, query, template, or deserialization
- No new dependencies with known CVEs (verify via SCA tooling or audit output)
- Container changes: minimal base image, non-root user preserved, no broadened capabilities
- New persistence operations are transactional / idempotent where required
- Health checks / liveness probes / graceful shutdown are not regressed
- Rate limiting / circuit breakers preserved or extended where needed
- Backwards compatibility: no breaking change without explicit migration path
</criteria>
<criteria>
- CI pipeline currently green on the head SHA (or has a documented justification)
- Pre-commit hooks not bypassed (no `--no-verify` evidence in commit trailers or descriptions)
- Lockfile updated when dependencies changed; no stale lockfile
- No vendored binaries or large generated artifacts committed
- Conventional commits format if the repo uses it
- Branch is rebased / not behind base by more than a small distance
- Lint, format, type-check pass on the diff
- No unrelated changes bundled into the PR (single-purpose PR)
</criteria>
# 🔍 STRICT PR Alignment Audit (Full Coverage)
## {{repo}} — PR #{{number}}: {{title}}
**Audit Date:** {{current_date}}
**Auditor:** Claude (Strict Mode — Full Coverage via Swarm)
**Grading Mode:** STRICT (Default F, evidence required for upgrades)
**Coverage:** Every changed file, every linked issue, every cited ADR — no sampling
---
## ⚠️ STRICT MODE WARNING
This audit uses rigorous grading standards:
- **Every dimension starts at F** and must earn upgrades with concrete evidence
- **A grades are RARE**
- **Most real PRs score C–D range** — this is normal
- **"It exists / looks done" is not enough** — the diff must demonstrably satisfy each criterion
- **No credit for follow-up issues or TODOs** — only what THIS change delivers counts
- **Architecture docs are not ground truth — code is**
---
## 🎯 PR Under Review
| Field | Value |
|-------|-------|
| Number | #N |
| Title | ... |
| Author | @... |
| Base → Head | main ← feature/... |
| Head SHA | abc1234 |
| CI Status | ✅ / ❌ / ⏳ |
| Linked Issues | #A, #B, #C |
| Files Changed | X |
| LOC | +A / -B |
---
## 📊 Executive Scorecard
| # | Dimension | Grade | Score | Status |
|---|-----------|-------|-------|--------|
| 1 | Requirements Alignment | ? | ??% | 🟢/🟡/🔴 |
| 2 | Architecture Alignment | ? | ??% | 🟢/🟡/🔴 |
| 3 | Code Quality of Diff | ? | ??% | 🟢/🟡/🔴 |
| 4 | Test Coverage of Diff | ? | ??% | 🟢/🟡/🔴 |
| 5 | Security & Safety | ? | ??% | 🟢/🟡/🔴 |
| 6 | CI / Build / Hygiene | ? | ??% | 🟢/🟡/🔴 |
| | **OVERALL** | **?** | **??%** | **🟢/🟡/🔴** |
🟢 A–B (healthy) | 🟡 C (needs attention) | 🔴 D–F (critical)
---
## 📋 Detailed Dimension Assessments
### Dimension 1: Requirements Alignment
**Grade: ? (??%)**
**Evidence Reviewed:**
- Issue(s): #N (read body, acceptance criteria, comments)
- PR description and commit messages
- Files changed: [list]
- Already-done check: [search results]
**Strengths:**
- [strength with file:line / issue field reference]
**Findings:**
- 🔴 CRITICAL: [finding with reference]
- 🟠 MAJOR: ...
- 🟡 MINOR: ...
- ⚪ NITPICK: ...
**Missing:**
- [acceptance criterion not addressed by the diff]
**Principle Compliance:**
- YAGNI: ...
- POLA: ...
---
[Repeat for all 6 dimensions]
---
## 🚨 Consolidated Findings
### Critical (Block Merge)
1. [DIM #] [finding with file:line] — [why blocking]
### Major (Fix Before Merge)
1. ...
### Minor (Fix Soon)
1. ...
### Nitpick (Optional)
1. ...
---
## 🔧 Triage Split
**Fix-now (in this PR):**
- [item — small, scoped, no design decision required]
**File-as-issue (separate work):**
- [item — requires design or cross-PR coordination]
- Reconcile against existing backlog FIRST: [list of related open issues searched]
---
## 🏛️ Architecture Alignment Detail
| Architecture Source | Reviewed? | Aligned? | Notes |
|---------------------|-----------|----------|-------|
| CLAUDE.md | ✅ | 🟢/🟡/🔴 | ... |
| docs/architecture/ | ✅ | 🟢/🟡/🔴 | ... |
| ADRs | ✅ | 🟢/🟡/🔴 | ... |
| README architecture sections | ✅ | 🟢/🟡/🔴 | ... |
| Actual code ground truth | ✅ | 🟢/🟡/🔴 | ... |
**Doc-vs-code drift detected:** [yes/no — if yes, list]
---
## 📈 Development Principles Compliance
| Principle | Compliance | Evidence |
|-----------|------------|----------|
| KISS | 🟢/🟡/🔴 | file:line |
| YAGNI | 🟢/🟡/🔴 | file:line |
| TDD | 🟢/🟡/🔴 | file:line |
| DRY | 🟢/🟡/🔴 | file:line |
| SOLID | 🟢/🟡/🔴 | file:line |
| MODULARITY | 🟢/🟡/🔴 | file:line |
| POLA | 🟢/🟡/🔴 | file:line |
---
## 📊 Audit Methodology
**Files Examined:** X of Y changed (every changed file)
**Linked Issues Read:** N of N
**ADRs / Architecture Docs Read:** N of N
**Coverage Gaps:** [list or "none"]
**Evidence Standard:** Every grade backed by direct reading of the diff, the linked issues, and the cited architecture documentation. Architecture claims verified against current source code (ground truth).
---
## 📝 Summary
[2–3 paragraph honest narrative: does this change actually do what was asked, fit the architecture, and ship safely? What are the strongest signals and the most pressing concerns?]
---
## ✅ GO / CONDITIONAL / NO-GO Verdict
### Verdict: **[GO ✅ | CONDITIONAL 🟡 | NO-GO 🔴]**
**Rationale:**
[Evidence-based explanation citing specific blockers]
**Critical Blockers (if NO-GO):**
1. [issue with file:line]
**Conditions for GO (if CONDITIONAL):**
1. [specific, measurable]
**Recommended Next Steps:**
1. [highest priority]
2. ...
3. ...
---
## 📊 Grade Distribution Summary
- A grades: X
- B grades: X
- C grades: X
- D grades: X
- F grades: X
- N/A: X
**Reality Check:**
[If many A's/B's, explicitly question whether anti-inflation was applied]
Wave 1 (Dimensions 1–3): Requirements Alignment, Architecture Alignment, Code Quality of Diff
Wave 2 (Dimensions 4–6): Test Coverage of Diff, Security & Safety, CI / Build / Hygiene
Each agent receives, verbatim:
- The dimension's criteria block
- The full grading rubric including anti-inflation rules
- The development principles block
- The PR metadata (number, head SHA, base, linked issues)
- The bucketed file list + reference material for that dimension
- Instruction: "READ EVERY FILE IN YOUR BUCKET in full. No sampling. For files >2000 lines use Read with offset/limit and chunk. For diffs, read the full hunk plus surrounding 50 lines of the changed file. Start at F and earn upward with concrete evidence. Cite every claim with file:line, issue#, ADR title, or PR comment ID. Apply anti-inflation rigorously."
- Instruction: "If the architecture documentation contradicts the actual code, the CODE is ground truth and the contradiction is itself a Dimension-2 finding."
- Instruction: "If you cannot verify a criterion (file unreadable, issue private, ADR missing), list it under Coverage gaps. Do NOT grade criteria you could not verify."
- Evidence is EVERYTHING. Every grade cites file:line, issue#, ADR title, or PR comment ID.
- Be ruthlessly honest. Accuracy over encouragement.
- Default to F. Every dimension starts at F.
- Code is ground truth, not docs. Doc-vs-code drift is a finding, not an excuse.
- Already-done is a real outcome. ~10–48% of audit-filed issues are already resolved. Check first.
- Missing tests / missing arch updates are MAJOR or CRITICAL, never nitpicks.
- No GitHub writes without explicit approval. This skill is read-only by default.
- Never use
--no-verifyor any hook bypass when the user follows up with fixes. - CLAUDECODE env var poisons nested Claude sessions — if you spawn sub-claudes to file issues, unset it first.
- Single-purpose PR principle: if the diff bundles unrelated changes, that itself is a Dimension-6 finding.