name: argus-review description: 'When asked to review a PR or branch ("review this PR", "PR review", "argus", before merging), fan the diff out to parallel read-only reviewers and return a severity-ranked verdict; optionally post inline via gh.'
Argus — autonomous PR review
Argus Panoptes, the hundred-eyed watchman who never fully sleeps. This skill is the hundred eyes: it fans the diff out to several independent reviewers at once, then judges what they saw.
Self-contained by design. Unlike a dependency-coupled orchestrator, Argus assumes nothing else is installed — no companion skills, no custom pr-*-reviewer agent types. It dispatches the default general-purpose subagent and inlines each reviewer's complete methodology into its prompt. Drop the argus-review/ folder into any agent setup and it works.
Coordinator + read-only. Argus never edits code. It dispatches, validates structured replies, aggregates a verdict, and (only when asked) posts findings to a PR. It does not review code itself in the parent — that would defeat the fan-out.
When to use
- "Review this PR / branch", "review against
<base>", "PR review", "argus" - Self-review before opening a PR or before merging
- After a refactor or feature completion
When NOT to use
- Quick pass on uncommitted scratch work — review inline, no orchestration needed
- Debugging a specific failure — that is a debugging task, not a review
- Applying fixes — Argus is read-only; fix in a separate step
- Repo-wide audits — every reviewer is diff-scoped
Invocation & scope
| Invocation | Meaning |
|---|---|
argus |
Current branch vs base — light (4 reviewers: quality, conventions, regression, logic) |
argus --full (or "complet") |
Current branch vs base — full (6 reviewers) |
argus <base> |
Current branch vs an explicit base (e.g. a stacked feature branch like feat/basket) |
argus <branch> --base <base> |
A colleague's branch — git fetch origin <branch> first |
argus --staged |
Staged changes only (git diff --cached) |
argus --post[=<PR#>] |
After reviewing, post findings inline on the PR via gh (see references/posting.md). Combine with any scope. |
French prose aliases (the user invokes in French — map, don't ask): "commente/commentaire direct(ement) sur la PR", "met les commentaires sur la PR", "poste sur la PR" → --post; "complet"/"full" → --full; "mode light" → light; "PR --post=<N> (a bare "review la PR 113" is a read-only review of that PR's branch — never auto-post without posting intent). When --post has no number, auto-resolve it from the branch (gh pr view <branch> --json number -q .number) and announce the resolved PR# — only ask when no open PR exists or resolution is ambiguous.
Depth default is light. Use full for large refactors, new modules, payment/auth/security-sensitive changes, or when the user says "complet"/"full".
Base resolution (in order): explicit arg → git symbolic-ref refs/remotes/origin/HEAD (strip origin/) → main → master → dev. If none exists, ask the user. Stacked feature branches are common — when the user names a base, trust it over the default.
Reviewer set per depth
| Depth | Reviewers dispatched |
|---|---|
| light (default) | quality · conventions · regression · logic |
full (--full) |
quality · architecture · regression · security · conventions · logic |
logic is the business-logic reviewer — it derives the intent of the change (PR description, linked PRD/Notion card, commit messages) and verifies the diff actually implements it: state transitions, calculations, role-based rules, front/back contract coherence. No other dimension owns "does the code do what the feature wants?".
Gaps the classic 5-dimension split misses are folded in, not added as new agents (keeps the fan-out cheap). Each folded check is capability-gated — it only runs when the project actually has that capability (detected in step 3), so a non-i18n project is never flagged for "hardcoded text":
| Folded check | Owner | Runs only if |
|---|---|---|
| i18n / hardcoded user-facing text / locale-unaware formatting | conventions | i18n detected (i18n lib or messages/·locales/ dir) |
| Accessibility (alt, aria, headings) | quality | frontend detected (JSX/markup in changed files) |
React anti-patterns (useEffect misuse, stale closures) |
quality | react detected |
| Payment/financial security (Stripe order-state, webhook idempotency) | security | payments detected (payment SDK / webhook code) |
| Cross-package boundary checks | architecture | monorepo detected |
| Error-handling & edge-cases, performance | quality | always (language-agnostic) |
| File size (>~400 lines → split) & inline-props-type extraction | quality | always (size); typed code (inline-type) |
| Library API best-practices (Zod, TanStack Query, Drizzle, …) | quality | lib detected in the stack manifest (step 3) |
Full per-reviewer methodology: references/dimensions.md. Capability detection: references/mechanical-checks.md. Per-library checklists: references/library-practices.md.
Procedure
- Resolve scope — depth, base, branch/staged,
--post. Verify the branch exists;git fetch origin <branch>for remote branches. - Pre-compute the diff once and reuse it for every reviewer (avoids N× redundant
git diff/re-reads):- File list:
git diff --stat <base>...<branch>(orgit diff --cached --stat) - Full patch:
git diff <base>...<branch>(orgit diff --cached). Save it to/tmp/argus_diff.patchso--postline-validation reuses it (never re-rungit difffor posting). - Strip ignored paths before passing the diff (generated/lock/snapshot — see
references/mechanical-checks.mdignore list). State what was dropped. - If the patch exceeds ~60k chars, trim per-file context to ±50 lines and flag the truncation to reviewers.
- File list:
- Detect project capabilities + stack manifest (
references/mechanical-checks.md§Capability detection) — deterministically resolve which folded checks even apply:i18n,frontend/a11y,react,monorepo,payments, plus the stack manifest (libs=zod@4,tanstack-query,drizzle,…frompackage.jsondeps). Compute once in the parent; pass the flags into every dispatch. For each detected lib, paste its checklist fromreferences/library-practices.mdinto the quality reviewer's envelope. A reviewer must skip a sub-check whose capability is off — flagging "hardcoded text" on a project with no i18n is a false positive, not a finding. Also gather the intent sources for thelogicreviewer: PR title+description (gh pr view --json title,bodywhen a PR exists), the linked PRD/Notion card if the PR body references one, andgit log <base>..<branch> --format='%s%n%b'. Pass them verbatim in the logic envelope — the reviewer must not invent intent. - Run the deterministic mechanical pre-pass (
references/mechanical-checks.md) — cheap grep/ripgrep over the diff for high-frequency, mechanically-detectable bans. This produces a seed list of candidate convention/security hits withfile:line. Pass the seed list into the conventions reviewer's prompt as "pre-found candidates — verify each against the rule source, then extend." Deterministic recall on the highest-frequency category, zero extra agents. - Dispatch reviewers in a single message (true parallelism). For each, send a
general-purposesubagent (the default — never a custompr-*-reviewertype) whose prompt = the shared envelope + that dimension's full methodology block + the pre-computed diff + the active capability flags + the compact contract. Build prompts perreferences/dimensions.md§Dispatch. - Wait for every reply before aggregating.
- Validate each reply against
references/contract.md— parse JSON, check required fields + enums, recount findings vssummary. Mismatch → failure handling below. - Aggregate — merge sections, apply dedupe, compute per-section + global counts.
- Compute verdict + confidence (rules below).
- Render the report per
references/report-format.md. - If
--post— post one PR review with each finding anchored inline atfile:lineperreferences/posting.md: default all severities — critical, warning, and nits, each anchored inline, concise verdict + counts in the review body. Never post the full report as a singlegh pr commentissue blob (and never both). Auto-resolve the target PR# from the branch and announce it ("posting to PR #113"); ask only if no open PR or ambiguous.
The parent never re-reads files or re-reviews code. Trust the structured replies.
Verdict rules
First match wins:
blocking— anycriticalfinding in any sectionneeds-attention— no critical, ≥ 1warningin any sectionpass— no critical, no warning
Verdict goes in the report header, before the summary table.
Confidence rules
high— every dispatched reviewer returnedstatus: "ok"andcoverage: "complete"medium— exactly 1 reviewer ispartialorerrorlow— 2+ reviewers arepartialorerror
Dedupe rules
Conservative. Collapse two findings iff all match: same section, same file, same line (or both omitted), same severity, normalized title equal (case-insensitive, trimmed). Never dedupe across sections. Security findings never collapse, even on a shared line.
Failure handling
- Error reply → retry once, same prompt. Still failing → mark failure, list under "Subagent failures".
- Malformed/prose output → retry once with: "Previous reply was not valid JSON — emit the JSON object only, no prose, no fences." Still bad → coerce to
status: "error"+ failure note. - Counts mismatch (
summary≠ recount offindings) → unreliable → mark failure. - Timeout on a huge diff → re-dispatch the failed reviewer only with a narrower path scope; flag
coverage: partial. - Never silently drop a reviewer. A missing section is a bug; an empty section is signal — keep it with "No findings."
Anti-noise
The mechanical pass and reviewers must not re-flag what the team has repeatedly judged non-issues: defensive type-guard / .filter narrowing, routine pagination edge-cases, autogenerated components/ui/* import order, generated files. Down-weight to nit or skip. Findings need a concrete file:line + evidence — no intuition-only flags.
Gotchas
- Reviewers share no state. Each prompt must carry base, branch, mode, absolute repo path, the diff, and its full methodology block. Generic agents have no skill loaded — inline everything.
- One message for all dispatches. Sequential dispatch kills parallelism.
- Never re-run
git diffin a reviewer. The inlined patch is authoritative; reviewers open files only for context outside the patch window. - Strip generated/snapshot/lock files before dispatch — they are noise and inflate the diff. The user always asks to ignore
snapshot.json& friends. - Default base may be
dev, notmain. Detect viaorigin/HEAD. Stacked feature-branch bases are common here — honor an explicit base arg. --postis inline-only. One PR review (reviews API) with findings anchored atfile:line; the review body is a short verdict+counts summary, not the rendered report. Nevergh pr commentthe full report as a single conversation blob, and never post inline and a full-report comment — the lone blob is "not visual" and the user rejects it.gh pr commentis allowed only for a zero-findingpass. Validate each finding's line against the diff hunks before posting (see posting.md) so they actually anchor instead of folding into the body.--postposts all severities — nits included. The user wants nits in the comments too. Anchor nits inline atfile:linelike any other finding; mark them[nit]in the comment body so they read as low-priority. Only drop nits if the user explicitly asks for critical+warning only on a given run.- JSON-only replies. Prose before/after the object is a failure — retry once, then mark failure.
- No new dependencies. If a reviewer recommends adding a library, surface it as a flag, do not act on it.
logicfindings need a cited intent source. Every logic finding quotes the PR description / PRD / commit line it contradicts, or names the violated invariant with the siblingfile:linethat establishes it. "The feature probably should…" without a source is a hallucination — drop it. When no intent source exists (no PR body, no PRD), the reviewer limits itself to internal-consistency checks and notes the missing intent innotes[].- Lib-behavior claims need a doc citation. A critical/warning whose proof lives in a third-party lib's semantics (not in repo code) must cite the lib's docs (context7 lookup, passage quoted in the finding) or be downgraded to a question/nit. Empirical basis: the only 2 dismissed criticals across ~113 posted findings were both unsourced better-auth claims (sofrapa #101 "slop", #117 "CF la doc de better-auth").
- Library checks are version-aware. The checklist in
library-practices.mdis the baseline; when unsure whether an API is current for the detected version, the reviewer queries the context7 MCP rather than flagging from memory — a stale-version flag is a false positive. - The user's next message after a report is often "corrige ce qui est pertinent". Argus stays read-only — the fix is a separate step, offered by the report footer. Don't pre-emptively edit.
What this skill deliberately does NOT depend on
- No external skills.
- No external contract file outside
argus-review/references/.
Everything a reviewer needs is inlined at dispatch time. That is the whole point.