name: deep-pr-review description: Use when the user provides a GitHub PR link with a "deep review" trigger inside a locally cloned repo, to perform a thorough multi-dimensional code review grounded in ARCS DAG context, AGENTS.md conventions, and optional codegraph coupling analysis, then post findings as inline GitHub review comments under explicit user gate
Skill: deep-pr-review
When
User gives a GitHub PR URL plus a "deep review" trigger inside the locally cloned repo of that PR. Reviewer-side workflow: read-only by default, posts to GitHub only after explicit user gate.
CLI:
arcs --commands --jsonfor discovery. Posting is user-gated. ARCS writes (optional knowledge harvest) run directly via the CLI.
Flow
flowchart TD
classDef bail fill:#ef4444,color:#fff
A[User: 'deep review' + PR URL] --> B[Gather phase — see Data Gathering section]
B --> C{cwd repo == PR repo?}
C -->|no| Z[Halt — wrong checkout]:::bail
C -->|yes| D{Label = wip/draft?}
D -->|yes| D1[Soften severity — flag as WIP]
D -->|no| D2[Standard severity]
D1 & D2 --> E[arcs context --audience=implementer --lean --json]
E -->|found| F[Load AGENTS.md + targeted arcs search]
E -->|missing| G[Degraded mode — heuristics only]
F --> H{Prior AI review exists?}
G --> H
H -->|yes| I[Diff against prior review commit_id only]
H -->|no| J[Use full PR diff]
I --> K[Pick adaptive rubric from diff context]
J --> K
K --> L{Diff size?}
L -->|huge >40 files OR >2000 lines| M[Force summary mode]
L -->|normal| N{codegraph available?}
N -->|yes| O[Run impact/query on changed symbols]
N -->|no| P[Skip coupling check — note in report]
M --> Q[Aggregate findings + cite each]
O --> Q
P --> Q
Q --> R[Present report + 5 posting modes]
R --> S{User choice}
S -->|don't post| END1[Show report only]
S -->|post| T[gh api: review + inline comments]
T --> U{Recurring pattern surfaced?}
U -->|yes + opt-in| V[Propose knowledge:create]
U -->|no| END2[Done]
V --> END2
Data Gathering (ONE PASS — no repeat gh reads)
Run these three commands once at the start. Cache the results. All downstream steps read from cache — never call gh repo view or gh pr view again.
1. gh repo view --json name,owner → REPO
2. gh pr view <number> --json number,title,body,author,labels,reviews,state,files,headRefName,baseRefName → PR_META
3. gh pr diff <number> → DIFF
| Downstream need | Read from |
|---|---|
| Repo-match check | REPO.name, REPO.owner |
| WIP / draft check | PR_META.labels, PR_META.state |
| Author context | PR_META.author |
| Prior review detection | PR_META.reviews |
| File list / LOC delta | PR_META.files |
| Diff text | DIFF |
Adaptive Rubric
Agent picks dimensions from diff context. Correctness is always evaluated. Other dimensions activate when the diff signals them:
| Dimension | Activates when |
|---|---|
| Correctness | Always — bugs, off-by-one, error handling, null safety |
| DRY | New code resembles existing patterns; cross-module grep finds duplicates |
| KISS | New abstraction layers, deep nesting, premature generalization |
| YAGNI | Code written "for later" with no current caller; abstractions with one concrete use; configurable hooks with one known value; generic machinery built for hypothetical consumers |
| SOLID | Module gains responsibilities, dependency direction shifts, large classes touched |
| Convention fit | AGENTS.md or DAG pattern/architecture knowledge applies to changed files |
| Architectural risk → handoff to the system-architect agent (structural audit) | Diff crosses module boundaries, touches god nodes, changes public API |
| Performance risk → handoff to the oncall-ops agent (performance investigation) | Hot paths, loops over external IO, new queries, allocations in render |
Skipped dimensions are reported as cleared (not applicable: <reason>). Never silently dropped.
Over-engineering / bloat pass
A focused pass that hunts ONLY over-engineering and complexity — correctness, security, and performance stay in the normal review pass above. Runs on a diff (delete-list for the changed lines) OR whole-repo (bloat audit). Lists findings only; applies nothing.
One finding per line, tagged:
| Tag | Catches | Replacement |
|---|---|---|
delete: |
Dead code, unused flexibility, speculative feature | nothing |
stdlib: |
Hand-rolled thing the standard library ships | name the function |
native: |
Dependency or code doing what the platform already does | name the feature |
yagni: |
Abstraction with one implementation, config nobody sets, layer with one caller | inline / remove |
shrink: |
Same logic in fewer lines | show the shorter form |
Format: L<line>: <tag> <what>. <replacement>. — use <file>:L<line>: ... for multi-file or whole-repo audits.
End with the only metric that matters: net: -<N> lines, -<M> deps possible. Nothing to cut → Lean already. Ship.
Boundary: never flag the single ONE-runnable-check that the-ladder requires for non-trivial logic as bloat.
Severity Prefixes
Inline findings are one line — <file>:L<line>: problem. fix. — prefixed by severity:
| Prefix | Meaning | Posting default |
|---|---|---|
🔴 bug: |
Broken behavior, will cause incident | Always post |
🟠 risk: |
Works but fragile, edge case unhandled | Always post |
🟡 suggestion: |
Concrete fix improving quality | Posted in modes 2/3 |
🔵 nit: |
Style / naming / minor consistency | Posted only in mode 3 |
❓ q: |
Genuine question for the author | Always post |
Posting Modes
User picks one before any gh write:
| # | Mode | What posts |
|---|---|---|
| 1 | Critical-only | 🔴 bug + 🟠 risk + ❓ q only |
| 2 | Critical + actionable | Above + 🟡 suggestion |
| 3 | All findings | Above + 🔵 nit |
| 4 | Summary only | Single top-level review body, no inline comments |
| 5 | Don't post | Show report only — no gh calls |
Iron Law
READ ONLY until user picks a posting mode. No gh writes, no ARCS writes, no auto-approve. Approval is only ever produced via explicit user override (approve it, lgtm post approve) — never inferred from finding count.
Citation Rule
Every finding cites a source. No uncited findings:
see knowledge/<id>: <title>— ARCS knowledge entryAGENTS.md §<section>— project conventioncodegraph: <observation>— coupling/impact resultprinciple: <KISS|DRY|YAGNI|SOLID|correctness>— first-principles label
If only first-principles applies, that is sufficient — but it must be stated.
Inline Suggestion Rule
GitHub ```suggestion blocks render an "Apply suggestion" button. Use only when the fix is a one-to-few-line replacement of existing lines on the diff. For larger fixes:
- Multi-line code restructure → inline review comment with a fenced code block (no
suggestiontag) - Missing block / new file content → top-level review body bullet
- Cross-file refactor → handoff finding recommending the system-architect agent (structural audit)
Posting Protocol (ONE gh api call — never per-finding)
All findings are batched into a single GitHub review submission. Never loop through findings and post each one individually.
gh api POST /repos/{owner}/{repo}/pulls/{number}/reviews \
--field commit_id="<PR head SHA from PR_META>" \
--field event="COMMENT" \
--field body="<top-level summary>" \
--field 'comments=[{"path":"...","position":N,"body":"..."},...]'
| Rule | Detail |
|---|---|
| One call per review session | Top-level body + all inline comments in the same comments[] array |
Never mix gh pr review and gh api |
Pick one entry point — use gh api for full control; gh pr review for body-only (mode 4) |
Never call gh pr comment after gh api reviews |
gh pr comment adds a stand-alone comment, not a review — it will duplicate the top-level body |
| Dry-run before sending | Print the full payload to the user for confirmation; only call gh api once user confirms |
Mode → command mapping
| Mode | Command |
|---|---|
| 1–3 (inline + summary) | gh api POST .../reviews with body + comments[] — one call |
| 4 (summary only) | gh pr review <number> --comment --body "..." — one call, no comments[] |
| 5 (don't post) | No gh writes |
Report Structure
# Deep PR Review: <repo>#<number> — <title>
## Pre-flight (repo match, PR state, prior reviews)
## Scope (files touched, LOC delta, modules affected)
## Rubric Selection (which dimensions activated, why)
## Findings (grouped by severity)
## Cleared Dimensions (with evidence)
## Architectural / Performance Handoffs (if any)
## Posting Plan (mode chosen → exact comments to be posted)
## Confidence & Gaps
Constraints
- Never repeat
gh repo vieworgh pr viewafter the initial gather pass — all data is cached upfront - ONE
gh apicall to post the review — batch all inline comments into thecomments[]array; never loop and post per-finding; never mixgh pr review+gh api+gh pr commentin the same session - Never auto-approve; approval only on explicit user override
- Never post to GitHub before user picks a posting mode
- Cite every finding — no uncited claims
```suggestionblocks only for small line-replacement fixes- Defer to the system-architect agent (structural audit) for full structural drift; surface as handoff flag, do not run inline
- Defer to the oncall-ops agent (performance investigation) for perf work; surface as risk flag
- Review dimensions are defined in this skill (Adaptive Rubric); inline findings use the one-line format
<file>:L<line>: problem. fix.— do not duplicate - Re-review detection: if AI has reviewed before, scope to diff since last review's commit_id
- Tag each posted suggestion with
<!-- arcs:deep-review:<finding-id> -->for re-review tracking - See
review-template.mdfor GitHub review body template - See
codegraph-diff.mdfor the changed-symbols-to-impact algorithm