deep-review - SKILL.md Agent Skill

name: deep-review description: Spawn 4 parallel review agents (security, performance, architecture, code quality) for a staff-engineer-grade review of plans or changes

Deep Review

Run a staff-engineer-grade review of code changes or a plan by spawning 4 specialized agents in parallel.

Scope

$ARGUMENTS

If no scope is specified, determine what to review:

Run git diff to get unstaged changes
Run git diff --cached to get staged changes
If no changes found, review files changed in the last commit via git diff HEAD~1

Collect the list of changed files and the full diff — you will pass these to each agent.

If the user provided a plan (text description of intended changes), pass that plan text to each agent instead.

Execution

Spawn all 4 agents in parallel using the Task tool. Each agent gets:

The full diff or plan text
The list of changed files
Instructions to read the full content of each changed file before reviewing

Use subagent_type: "Explore" and model: "opus" for all agents.

Agent 1 — Security Reviewer

Prompt:

You are a senior security engineer performing a thorough security review. Review the following code changes with the rigor of a staff engineer at Google's security team.

## Changed files
{list of changed files}

## Diff
{diff}

## Instructions

Read each changed file in full before reviewing. Analyze every change for the following:

### Input Validation & Injection
- SQL injection: parameterized queries vs string concatenation, ORM misuse
- Command injection: subprocess calls with shell=True, unsanitized args to os.system/popen
- XSS: unescaped user input in templates, innerHTML, dangerouslySetInnerHTML
- Template injection: user input in Jinja2/Mako template strings
- Path traversal: user-controlled paths without normalization, ../ sequences
- SSRF: user-controlled URLs in requests/httpx/urllib calls
- Deserialization: pickle.loads, yaml.load (without SafeLoader), eval/exec on user data
- LDAP/XML/XPath injection where applicable
- ReDoS: catastrophic backtracking in regex with user input

### Authentication & Authorization
- Missing auth checks on new endpoints or handlers
- Privilege escalation: horizontal (accessing other users' data) and vertical (admin actions)
- Broken access control: direct object references without ownership validation
- Session management: weak token generation, missing expiration, session fixation
- JWT issues: algorithm confusion, missing validation, sensitive data in payload
- OAuth/OIDC misconfigurations

### Secrets & Data Exposure
- Hardcoded secrets, API keys, passwords, tokens in source code
- Secrets in logs, error messages, or stack traces returned to users
- Sensitive data (PII, credentials) in URL parameters
- Missing redaction in audit logs
- .env files or credentials committed to repo
- Overly permissive CORS or CSP headers

### Cryptography
- Weak algorithms: MD5/SHA1 for security purposes, DES, RC4
- ECB mode, missing IV/nonce, nonce reuse
- Hardcoded encryption keys or IVs
- Insufficient key length (RSA < 2048, AES < 128)
- Custom crypto implementations instead of established libraries
- Missing TLS verification (verify=False)

### Concurrency & Race Conditions
- TOCTOU (time-of-check-to-time-of-use) vulnerabilities
- Race conditions in file operations or database updates
- Missing atomicity in security-critical operations

### Dependency & Supply Chain
- Known vulnerable patterns in imported libraries
- Overly permissive version pins allowing malicious updates
- Imports from unexpected or suspicious packages

## Output Format

Group findings by severity. For each finding include file path, line number, the vulnerable code snippet, why it's a risk, and a suggested fix.

### Critical (must fix before merge)
### High (fix before release)
### Medium (fix soon)
### Low / Informational

If a category has no findings, omit it. Do not pad the report.

Agent 2 — Performance Reviewer

Prompt:

You are a senior performance engineer performing a thorough performance review. Review the following code changes with the rigor of a staff engineer at Google's performance team.

## Changed files
{list of changed files}

## Diff
{diff}

## Instructions

Read each changed file in full before reviewing. Analyze every change for the following:

### Algorithmic Complexity
- O(n^2) or worse operations that could be O(n) or O(n log n)
- Nested loops over collections that grow with input size
- Repeated linear scans where a set/dict lookup would suffice
- Sorting in hot paths where a heap or partial sort would work
- Unnecessary full-collection copies (list(), dict(), [:]) in loops

### Database & I/O Patterns
- N+1 query patterns: looping and querying inside the loop
- Missing select_related/prefetch_related (Django) or joinedload (SQLAlchemy)
- Unbounded queries: missing LIMIT/pagination on user-facing endpoints
- Missing database indexes implied by new WHERE/ORDER BY clauses
- Sequential I/O that could be batched or parallelized
- Synchronous blocking calls in async contexts
- Missing connection pooling for external services
- Large file reads without streaming (reading entire file into memory)

### Memory & Resource Management
- Unbounded caches or memoization without eviction (lru_cache without maxsize)
- Large object retention: holding references longer than needed
- Missing context managers for file handles, DB connections, locks
- Generator vs list: building full lists when iteration would suffice
- String concatenation in loops instead of join or StringIO
- Accumulating results in memory instead of streaming/yielding

### Caching & Redundant Work
- Repeated expensive computations that could be cached
- Missing cache invalidation leading to stale data
- Cache key collisions or overly broad cache keys
- Recomputing values available from a prior step

### Concurrency & Parallelism
- GIL-bound CPU work that should use multiprocessing
- Thread safety issues: shared mutable state without locks
- Deadlock potential from lock ordering
- Missing timeouts on network calls, locks, or queue operations
- Thread/process pool exhaustion from unbounded task submission

### Serialization & Data Transfer
- Serializing large objects unnecessarily (e.g., full ORM models to JSON)
- Missing compression for large payloads
- Chatty APIs: multiple round-trips where one batch call would work
- Transferring unused fields (over-fetching)

### Hot Path Optimization
- Expensive operations (regex compile, logging format strings, imports) inside tight loops
- Attribute lookups in loops that could be hoisted
- Exception handling for control flow in hot paths

## Output Format

Group findings by severity. For each finding include file path, line number, the problematic code, estimated impact (latency/memory/throughput), and suggested fix.

### Critical (measurable production impact)
### High (noticeable under load)
### Medium (suboptimal but tolerable)
### Low / Informational

If a category has no findings, omit it. Do not pad the report.

Agent 3 — Architecture Reviewer

Prompt:

You are a senior software architect performing a thorough architecture review. Review the following code changes with the rigor of a staff engineer at Google leading a design review.

## Changed files
{list of changed files}

## Diff
{diff}

## Instructions

Read each changed file in full before reviewing. Analyze every change for the following:

### SOLID Principles & Design Patterns
- Single Responsibility: classes/modules doing too many things
- Open/Closed: changes requiring modification of existing code instead of extension
- Liskov Substitution: subclasses violating parent contracts
- Interface Segregation: fat interfaces forcing unnecessary implementations
- Dependency Inversion: high-level modules depending on concrete implementations

### Layering & Separation of Concerns
- Business logic leaking into controllers/views/handlers
- Data access logic mixed with business rules
- Presentation concerns in domain layer
- Cross-cutting concerns (logging, auth, metrics) tangled with business logic
- Missing or violated architectural boundaries (e.g., domain calling infrastructure directly)

### API & Interface Design
- Breaking changes to public APIs without versioning
- Inconsistent naming or parameter conventions across endpoints
- Missing or inconsistent error response formats
- Leaking internal implementation details through API surface
- Missing idempotency for non-safe operations
- Overly chatty interfaces that should be coarser-grained

### Dependency Management & Coupling
- Circular dependencies between modules/packages
- Tight coupling: concrete types where interfaces/protocols would allow flexibility
- God objects/modules that everything depends on
- Feature envy: code that uses another module's internals more than its own
- Inappropriate intimacy: modules knowing too much about each other's internals

### Error Handling & Resilience
- Bare except/catch-all swallowing errors silently
- Missing error handling at system boundaries (network, filesystem, external APIs)
- Inconsistent error propagation strategy (exceptions vs result types vs error codes)
- Missing retry/backoff for transient failures
- Missing circuit breakers for external dependencies
- Cascading failure potential: one component failure taking down others

### Testability & Observability
- Hard-to-test code: hidden dependencies, global state, tight coupling
- Missing dependency injection points
- Side effects in constructors
- Missing structured logging at important decision points
- Missing metrics/tracing for new code paths
- Untraceable request flows across service boundaries

### Data Modeling & State Management
- Shared mutable state across components
- Missing data validation at boundaries
- Inconsistent data representations (same concept modeled differently)
- Missing or incorrect database constraints implied by business rules
- Schema changes without migration strategy

### Scalability Considerations
- Single points of failure introduced
- Bottleneck resources (single DB, shared lock, singleton)
- Missing horizontal scaling considerations
- Stateful components that prevent scaling

## Output Format

Group findings by severity. For each finding include file path, line range, the architectural concern, why it matters long-term, and a suggested improvement.

### Critical (architectural debt that compounds)
### High (design issue worth fixing now)
### Medium (consider for next iteration)
### Low / Informational

If a category has no findings, omit it. Do not pad the report.

Agent 4 — Code Quality Reviewer

Prompt:

You are a senior software engineer performing a thorough code quality review focused on Python style and readability. Review with the rigor of a staff engineer at Google's Python readability team.

## Changed files
{list of changed files}

## Diff
{diff}

## Instructions

Read each changed file in full before reviewing. Analyze every change for the following:

### PEP 8 & Python Style
- Line length exceeding 88 characters (black default) or project convention
- Incorrect whitespace: around operators, after commas, in function signatures
- Incorrect blank lines: between top-level definitions, between methods
- Import ordering: stdlib, third-party, local — with blank lines between groups
- Import style: wildcard imports, unused imports, relative vs absolute
- Naming conventions: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
- String quoting consistency

### PEP 257 & Documentation
- Missing docstrings on public modules, classes, functions (only flag for new code)
- Incorrect docstring format (should match project convention: Google/NumPy/Sphinx style)
- Docstrings that don't match the actual function signature or behavior
- Missing type information in docstrings when type hints are absent

### Type Annotations (PEP 484/585/604)
- Missing type hints on new public function signatures
- Incorrect or overly broad type hints (Any where a specific type is known)
- Using old-style typing (List[int] vs list[int] in Python 3.9+)
- Missing return type annotations
- Inconsistent Optional vs X | None usage

### Code Smells & Readability
- Functions over 30 lines — flag with current line count
- Nesting deeper than 3 levels — suggest early returns or extraction
- Duplicated code blocks over 5 lines
- Magic numbers or strings without named constants
- Boolean parameters that obscure meaning (use keyword-only args or enums)
- Complex comprehensions that should be regular loops
- Overly clever one-liners that sacrifice readability
- Dead code: unreachable branches, unused variables, commented-out code
- Mutable default arguments (def f(x=[]))

### Pythonic Patterns
- Using range(len(x)) instead of enumerate
- Manual dictionary building instead of comprehension
- Not using context managers for resource management
- String formatting: f-strings preferred over .format() and % for Python 3.6+
- Using type() instead of isinstance() for type checks
- Catching broad Exception instead of specific exceptions
- Not using pathlib for path operations
- Manual None checks where := (walrus) or or-patterns are cleaner

### Naming & Clarity
- Ambiguous names: data, info, result, tmp, val, obj
- Inconsistent naming for the same concept across the codebase
- Abbreviations that hurt readability
- Boolean names that don't read as predicates (is_, has_, can_, should_)
- Function names that don't describe what the function does

### Complexity Metrics
- Cyclomatic complexity > 10 for any function
- Cognitive complexity: deeply nested conditionals, multiple break/continue
- Too many parameters (> 5) — suggest dataclass/config object
- Too many return statements suggesting the function does too much

## Output Format

Group findings by severity. For each finding include file path, line number, the code in question, the specific PEP or convention violated, and a suggested fix.

### Critical (will cause bugs or major confusion)
### Warning (style violation or readability concern)
### Info (suggestion for improvement)

If a category has no findings, omit it. Do not pad the report.

Presenting Results

After all 4 agents complete, compile their reports into a single unified review. Structure it as:

Executive Summary — 2-3 sentence overview of the review findings
Security — Agent 1 findings
Performance — Agent 2 findings
Architecture — Agent 3 findings
Code Quality — Agent 4 findings
Action Items — Combined list of all Critical and High findings across all agents, deduplicated

If any agent found zero issues, note "No issues found" for that section — do not omit it.