review-code - SKILL.md Agent Skill

name: review-code description: Paranoid architect review of a PR. Given a GitHub PR link or number, performs deep code review for bugs, security vulnerabilities, missing tests, undocumented assumptions, and edge cases. Leaves comments on GitHub for the author. disable-model-invocation: true allowed-tools: Bash(gh ), Bash(git ), Read, Grep, Glob argument-hint: ""

Paranoid Architect Code Review

You are reviewing this PR as a paranoid architect. Your job is to find every bug, vulnerability, race condition, edge case, and undocumented assumption before it ships. Assume adversarial users, concurrent access, and Murphy's law.

Step 1: Resolve the PR

Parse $ARGUMENTS to extract the PR number:

If it's a URL like https://github.com/owner/repo/pull/123, extract 123.
If it's a bare number, use it directly.
If empty, stop and ask the user for a PR number.

Fetch PR metadata:

gh pr view {number} --json title,body,baseRefName,headRefName,files,additions,deletions

Step 2: Load the full diff

gh pr diff {number}

Also get the list of changed files:

gh pr diff {number} --name-only

Step 3: Read every changed file in full

For each changed file, read the ENTIRE current file (not just the diff hunks). You need surrounding context to catch:

Callers of modified functions that now behave differently
Trait/interface contracts that the change may violate
Invariants established elsewhere that the diff breaks

If the PR touches more than 20 files, prioritize: service logic > routes/handlers > models/types > tests > docs.

Step 4: Deep review

Go through the changes with each of these lenses. For every finding, note the file, line range, severity, and a concrete description.

4a. Correctness and bugs

Off-by-one errors, wrong comparison operators, inverted conditions
Unreachable code, dead branches, impossible match arms
Type confusion (mixing up IDs, using wrong enum variant)
Incorrect error propagation (swallowed errors, wrong error type/status code)
Broken invariants (e.g. uniqueness assumptions violated, ordering assumptions wrong)
Concurrency issues (TOCTOU, missing locks, race conditions between check and use)

4b. Edge cases and failure handling

What happens with empty input, None/null, zero-length collections?
What happens when external services fail (DB down, HTTP timeout, malformed response)?
What happens at integer boundaries (overflow, underflow, i64::MAX)?
What happens with malformed or adversarial input (invalid UTF-8, huge payloads, deeply nested JSON)?
Are all error paths tested? Does every ? propagation make sense?
Are partial failures handled (e.g. wrote to DB but failed to emit event)?

4c. Security (assume a malicious actor)

Authentication/Authorization bypass: Can an unauthenticated user reach this? Can workspace A's user access workspace B's data? Are there IDOR vulnerabilities?
Injection: SQL injection via string interpolation? Command injection? Log injection? Header injection?
Data leakage: Are secrets, PII, or conversation content logged? Returned in error messages? Exposed in API responses?
Resource exhaustion / DoS: Can an attacker send unbounded input? Trigger expensive operations without rate limits? Cause OOM via large allocations?
Financial abuse: Can tokens/credits be consumed without being tracked? Can usage limits be bypassed? Can billing be manipulated?
Replay / race conditions: Can the same request be replayed for double-spend? Can concurrent requests bypass limits?
Cryptographic issues: Timing attacks on comparisons? Weak randomness? Missing HMAC verification?

4d. Test coverage

Is every new public function/method tested?
Are error paths tested (not just happy paths)?
Are edge cases covered (empty input, boundary values, concurrent access)?
Do existing tests still make sense with the new changes, or do they assert stale behavior?
Are there integration/e2e tests for the full flow?
If a test is missing, describe exactly what test should be written.

4e. Documentation and assumptions

Are new assumptions documented in comments? (e.g. "this field is always non-empty because X")
Are non-obvious algorithms or business rules explained?
Is the module-level documentation updated to reflect new capabilities?
Are API contracts (request/response shapes, error codes, status codes) documented?
If the change adds a new pattern or convention, is it explained for future contributors?
Are there TODO/FIXME/HACK comments that should be tracked as issues?

4f. Architectural concerns

Does this change follow existing patterns in the codebase, or does it introduce a new one without justification?
Are there unnecessary abstractions or premature generalizations?
Is there duplicated logic that should be extracted?
Are dependencies between modules clean, or does this create circular/tight coupling?
Will this change make future work harder?

Step 5: Present findings

Summarize findings to the user as a table:

#	Severity	Category	File:Line	Finding	Suggested Fix

Severity levels:

Critical: Security vulnerability, data loss, or financial exploit
High: Bug that will cause incorrect behavior in production
Medium: Robustness issue, missing validation, or incomplete error handling
Low: Style, naming, documentation, or minor improvement
Nit: Optional suggestion, take-it-or-leave-it

Ask the user which findings to post as PR comments. Default: all Critical, High, and Medium.

Step 6: Post comments on GitHub

For each approved finding, post a review comment on the PR at the specific file and line:

gh api repos/{owner}/{repo}/pulls/{number}/comments \
  -f body="..." \
  -f path="..." \
  -f commit_id="..." \
  -F line=... \
  -f side="RIGHT"

For findings that span multiple locations or are architectural, post as a regular PR comment:

gh pr comment {number} --body "..."

Format each comment clearly:

Severity tag (e.g. **High Severity**)
One-line summary
Detailed explanation of the issue
Concrete suggestion for the fix (with code if possible)

Rules

Read every changed file in full before writing a single finding. Context matters.
Never post a comment about code you haven't actually read. Verify line numbers against the actual file.
Be specific. "This might have issues" is useless. "Line 42 returns 404 but should return 400 because X" is useful.
Distinguish between "this IS a bug" and "this COULD be a bug if X". Be honest about certainty.
Don't nitpick formatting or style unless it causes actual confusion. Focus on substance.
If the code is good and you find nothing, say so. Don't invent problems to look thorough.
Respect the project's CLAUDE.md privacy rules: never include customer data, secrets, or PII in comments.
When in doubt about severity, round up. It's cheaper to dismiss a false alarm than to miss a real bug.