code-review

star 0

Multi-agent code review for Python projects. Use when reviewing local uncommitted changes, pull request diffs, or when asked to do a code review. Dispatches specialist agents (bug-hunter, security-auditor, test-coverage, code-quality, contracts, historical context) and aggregates results with confidence scoring.

scylladb By scylladb schedule Updated 3/7/2026

name: code-review description: >- Multi-agent code review for Python projects. Use when reviewing local uncommitted changes, pull request diffs, or when asked to do a code review. Dispatches specialist agents (bug-hunter, security-auditor, test-coverage, code-quality, contracts, historical context) and aggregates results with confidence scoring.

Code Review

Comprehensive multi-agent code review adapted for Python codebases. Six specialist agents examine code from different angles — bugs, security, test coverage, code quality, contracts, and historical context — then a confidence-scoring phase filters out false positives before reporting.

Essential Principles

**Run each specialist agent independently.**

Each agent examines a different dimension. Agents run one at a time via the Task tool (Copilot does not support parallel sub-agents). Keep agents independent so their findings are not influenced by each other's results.

**Filter by confidence × impact, not by count.**

A review that reports 50 low-confidence nitpicks is worse than one that reports 3 high-confidence critical issues. Every finding goes through a confidence-scoring phase; only issues above threshold survive.

**Apply Python-specific knowledge, not generic rules.**

Check for Python-specific pitfalls: mutable default arguments, late-binding closures, except Exception swallowing, unsafe pickle/yaml.load, missing async/await, Pydantic validation gaps, and Django/FastAPI framework patterns. Skip checks that don't apply to Python.

**Read AGENTS.md, pyproject.toml, and ruff config before flagging style issues.**

This project uses ruff for linting/formatting (line-length 120), pytest for tests, uv for dependency management, and Conventional Commits. Don't flag issues the existing toolchain already catches.

**Every finding must include file path, line number, and a concrete fix.**

Vague advice ("consider improving error handling") wastes reviewer time. Cite the exact location, explain the consequence, and show a code fix.

When to Use

  • Reviewing local uncommitted changes before committing
  • Reviewing a pull request for bugs, security, and quality
  • Running a pre-merge quality gate
  • Investigating a specific file or module for issues
  • When asked to "review", "audit", or "check" code

Example Prompts

These prompts will trigger the code-review skill:

Review a PR:

  • Review PR #42
  • Do a code review of this PR
  • Run code-review on PR #184
  • Audit the changes in PR #42 for bugs and security issues

Review local changes:

  • Review my local changes
  • Code review the uncommitted changes
  • Check my changes before I commit

Review specific files:

  • Review src/coodie/cql_builder.py for bugs
  • Security audit the driver module

When NOT to Use

  • Running linters or formatters — use uv run pre-commit run --all-files directly
  • Writing new tests — use the test-refactoring skill or write them manually
  • Performance benchmarking — use the benchmarks skill
  • Planning new features — use the writing-plans skill

Review Mode Selection

What are you reviewing?
│
├─ Local uncommitted changes (git diff)
│  └─ Follow: workflows/review-local-changes.md
│
├─ A pull request (PR number given)
│  └─ Follow: workflows/review-pr.md
│
└─ Specific files (paths given)
   └─ Treat as local changes review scoped to those files

Agent Architecture

Six specialist agents run independently during Phase 2 of each workflow:

Agent Focus Reference
Bug Hunter Runtime errors, None/null issues, race conditions, resource leaks bug-hunter.md
Security Auditor Injection, auth bypass, secrets exposure, unsafe deserialization security-auditor.md
Test Coverage Missing tests, weak assertions, untested error paths test-coverage-reviewer.md
Code Quality PEP 8, complexity, naming, SOLID, DRY, project conventions code-quality-reviewer.md
Contracts Pydantic models, type hints, API schemas, breaking changes contracts-reviewer.md
Historical Context Git blame, past bugs, hotspot files, architectural drift historical-context-reviewer.md

Agent Applicability

Not every agent needs to run on every review. Select based on what changed:

Condition Agents to Run
Any code changes bug-hunter, security-auditor, code-quality
Test files changed + test-coverage
Types/models/API changed + contracts
Complex or high-risk changes + historical-context
All (default) All six agents

Confidence Scoring

After agents report findings, each issue is scored on two axes:

Confidence (0–100): How certain is this a real issue?

Score Meaning
0 False positive — doesn't survive scrutiny
25 Might be real, might be false positive
50 Real issue, but minor or unlikely in practice
75 Verified real issue, will be hit in practice
100 Certain — evidence directly confirms it

Impact (0–100): How bad is it if left unfixed?

Score Meaning
0–20 Minor code smell, no functional impact
21–40 Hurts maintainability or readability
41–60 Causes errors in edge cases or degrades performance
61–80 Breaks core features or corrupts data
81–100 Runtime crash, data loss, or security breach

Filtering Thresholds

Higher-impact issues require less confidence to pass:

Impact Min Confidence Rationale
81–100 (Critical) 50 Critical issues warrant investigation even with moderate confidence
61–80 (High) 65 High-impact issues need good confidence
41–60 (Medium) 75 Medium issues need high confidence
21–40 (Low-Medium) 85 Low-medium issues need very high confidence
0–20 (Low) 95 Minor issues only if nearly certain

False Positive Examples

These should be filtered out:

  • Pre-existing issues in unchanged code
  • Issues that ruff, ty, or pytest would catch (CI handles those)
  • Pedantic nitpicks a senior engineer wouldn't flag
  • Intentional functionality changes related to the broader change
  • Issues silenced by # noqa, type: ignore, or similar comments
  • General suggestions without specific evidence ("consider adding more tests")

Quick Reference: Python-Specific Checks

Category What to Look For
Mutable defaults def f(items=[]) — use None + conditional
Broad except except Exception / bare except: swallowing errors
Async pitfalls Missing await, blocking calls in async functions
Pydantic Missing validators, wrong model_config, mutable defaults in models
Type hints Missing return types on public functions, Any overuse
Imports Circular imports, unused imports, inline imports
Resource leaks Files/connections not using context managers
Security pickle.load, yaml.load (not safe_load), eval(), f-string SQL
Testing Assertions without messages, assert True, missing parametrize IDs

Reference Index

File Content
bug-hunter.md Bug detection agent instructions — root cause analysis, Python-specific patterns
security-auditor.md Security audit agent — OWASP, Python-specific vulnerabilities
test-coverage-reviewer.md Test coverage agent — pytest patterns, behavioral coverage
code-quality-reviewer.md Code quality agent — PEP 8, SOLID, complexity, project conventions
contracts-reviewer.md Contracts agent — Pydantic, typing, API design, breaking changes
historical-context-reviewer.md Historical context agent — git archaeology, hotspots, past patterns
Workflow Purpose
review-local-changes.md 3-phase workflow for reviewing uncommitted local changes
review-pr.md 3-phase workflow for reviewing a pull request

Success Criteria

A well-executed code review:

  • Ran all applicable specialist agents
  • Every finding includes file path, line number, and concrete fix
  • Confidence scoring filtered out false positives (threshold applied)
  • No issues flagged that ruff/ty/pytest would already catch
  • Python-specific patterns checked (mutable defaults, async, Pydantic)
  • Project conventions (AGENTS.md, pyproject.toml) respected
  • Critical/High issues clearly separated from suggestions
  • Output follows the structured report template
Install via CLI
npx skills add https://github.com/scylladb/coodie --skill code-review
Repository Details
star Stars 0
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator