code-review-chunk-dispatch

name: code-review-chunk-dispatch description: |- Use when the code-review-backlog is large (>30 commits OR >5,000 LoC changed since last review) AND the caller has Agent/Task-tool dispatch capability (.e., top-level orchestrator or main Claude session). Trigger on phrases like "code-review backlog is big", "no review for weeks", "many commits without review", "review of 100+ commits", "how do I review these 200 commits", "chunk review", "parallel code-reviewer subagents". Do NOT load for single-PR review (use superpowers:requesting-code-review directly), for <30-commit backlog (single subagent suffices), for non-Git codebases (chunking-by-SHA-range presumes git workflow), or when running as a subagent yourself without Agent-tool access (the chunk-dispatch pattern is then unexecutable — use risk-based-triage instead, documented in the skill's Fallback section). Complements (sub-of) superpowers:requesting-code-review and superpowers:dispatching-parallel-agents.

✅ PROMOTED: TDD Cycle 1 (caller-context refactor) + TDD Cycle 2 (Mini-Verify-Fallback pass, S3 Pattern-Mode value-prop confirmed) passed. Auto-discoverable. Polish items from Mini-Verify subagent feedback incorporated: sample definition, counter-thesis depth hierarchy (Critical=mandatory / Important=recommended / Minor=optional), tabular trust-verdict format in fallback, TDD progression as Background label.

STOP — Caller-Context-Check (before anything else)

This skill recommends parallel dispatch via sub-subagents. That presumes: YOU (the caller) have an Agent/Task tool.

Before doing anything else, check:

Caller check	Action
Do I have access to the `Agent` tool (top-level orchestrator, main Claude session)?	✓ → continue with `## Pattern (short form)` and dispatch
I do NOT have the `Agent` tool (e.g., I'm a subagent myself)?	→ jump to `## Fallback: Sequential-Triage-Mode` BELOW. Do not try to force the chunk pattern sequentially.

Anti-pattern: blindly following the skill and chunking sequentially even though you can't parallel-dispatch — that's worse than natural risk-based triage. Proven by TDD-Test (see ## TDD progression below): GREEN subagent without Agent-tool produced WORSE result (1 instead of 4 Critical findings, +70% wallclock) through forced chunking than a baseline subagent without skill that applied natural risk-based triage.

Rationalization trap:

Rationalization	Reality
"The skill says chunk, so I chunk sequentially"	The skill presumes PARALLEL dispatch. Sequential chunking does NOT deliver the skill's promise, but has the overhead.
"My caller surely has Agent-tool, I'm a code-reviewer after all"	Check it. If `Agent`-tool not in your tool-set → you are a subagent.
"A bit of chunking is better than none"	Wrong. Risk-based triage without artificial chunks is better documented (see fallback).
"The caller wants to see chunks"	The caller wants good bugs. Coverage-disclosure in the report is more transparent than chunk-theater.

Pattern (short form)

When the code-review-backlog is large (heuristic: >30 commits OR >5,000 LoC OR >4 weeks without review):

Measure scope reality — git log --since, git diff --stat, commit count
Identify thematic chunks — ≤15 commits per chunk, ≤6 chunks total covering high-stake-recent-work, tail (old + less time-critical) as a separate backlog block
Determine per-chunk a BASE_SHA / HEAD_SHA range with git rev-parse <sha>~1 for inclusive starts
Parallel dispatch in a single message block with multiple Agent-tool invocations (NOT sequentially — wastes wallclock)
Aggregate findings by Critical / Important / Minor + chunk-trust assessment (Acceptable / Acceptable-with-fixes / Rollback-suggested)

Fallback: Sequential-Triage-Mode (when you have NO Agent-tool)

You landed here because the caller-context check above showed: you can't parallel-dispatch. DON'T chunk, do risk-based triage:

Measure scope as above (git log --since, git diff --stat, commit count) — still required for reporting
Identify risk domains instead of thematic chunks:

Domain	Content	Read depth
HIGHEST	Money-path (trading logic, payments, auth, live persistence, order execution)	Full-text read of all touched files
HIGH	Adjacent to money-path (notifications, schedulers, background jobs, ML inference, DB migrations)	Full-text read of diff-relevant files
MEDIUM	Operations / UI / cockpit	Sample = at least core methods (entry points + public API) of each diff-relevant file, plus targeted grep passes for suspected anti-patterns. Files <200 LoC completely.
LOW	Tests, docs, templates, legacy moves, config reshuffles	Skim via commit messages + `git diff --stat`, not line-by-line. Sample only when test logic backstops a Critical fix.

Targeted grep passes for domain-typical anti-patterns (e.g. _load_* 2x in the same statement, direction.*==.*"LONG" for casing drift, cur.execute.*UPDATE.*WHERE without rowcount check)
Findings by severity as in the main pattern (Critical / Important / Minor)
Counter-thesis check:
- Critical → MANDATORY: can the bug have another explanation? Is the diagnosis backed by code/schema/log?
- Important → RECOMMENDED: for non-obvious findings explicitly write out counter-thesis
- Minor → OPTIONAL: only when reviewer is uncertain
EXPLICIT-COVERAGE-DISCLOSURE in the report — mandatory:
- Which files / modules did I read COMPLETELY?
- Which did I sample?
- Which did I NOT read?
- That way the caller knows what the review does NOT cover → can specifically request more.

Why not artificial chunking? Without parallel dispatch, chunking is pure-overhead — it forces you to allow less depth per chunk, without delivering the throughput advantage. Risk-based triage scales better on single-threaded and delivered more Critical findings in the RED-S1-Test than the forced sequential chunking in the GREEN-S1-Test.

Output format for fallback mode (same as main pattern, plus coverage-disclosure + tabular verdict):

## Approach
[Risk-based triage, which domain tiers were read at what depth]

## Coverage-Disclosure
Fully read: [list of files]
Sample (core methods + grep): [list]
Not read: [list or domain description]

## Findings
### Critical (each with counter-thesis check)
### Important (with counter-thesis check for non-obvious)
### Minor

## Trust-Verdict (tabular, one row per domain)
| Domain | Read depth | Verdict |
|---|---|---|
| HIGHEST | Full text | Acceptable / Acceptable-with-fixes / Patch-immediately |
| HIGH | Full text (diff-relevant) | ... |
| MEDIUM | Sample | ... |
| LOW | Skim | (usually: Acceptable) |

## Recommendation
IMMEDIATELY / IN backlog / TRIVIAL — what when?

Concrete example

Scope snapshot: 246 commits over 4 weeks, 287 files, +44,872 / -1,492 LoC.

Single-subagent attempt would have failed: 44k lines diff exceed context limit. Even if not: output would be "code looks OK, some TODOs present" — generic, no finding-value.

Chunk split:

Chunk	Range	Commits	Theme	Risk
A	`5e5dd37..9a2cecb`	4	Today's dashboard bug + charts + race-fix	medium
B	`1b32427..5e5dd37`	13	Yesterday's cockpit Phase B + Phase A refactor	high (production-deployed)
C	`344671a..44c2e09`	9	Setup detector + ML / strategic	very high (live production)
D	`44c2e09..1b32427`	4	Telegram bot + dashboard SSD	high
E	`<base>..04fb03e`	~216	Tail (older work)	unknown, low (old)

→ Chunks A-D = 30 commits = ~12% of backlog, but 100% of high-stake-recent work. Chunk E documented as a separate backlog item.

Parallel dispatch in a single message block:

<function_calls>
  <Agent description="Review Chunk A" prompt="...">...</Agent>
  <Agent description="Review Chunk B" prompt="...">...</Agent>
  <Agent description="Review Chunk C" prompt="...">...</Agent>
  <Agent description="Review Chunk D" prompt="...">...</Agent>
</function_calls>

Wallclock: 10-15 min for all 4. If sequential: 40-60 min for the same output quality.

Aggregation:

3 Critical (all in Chunk C — production logic)
11 Important across all 4 chunks
Various Minor per chunk
Chunk trust assessment: A=Acceptable, B-D=Acceptable-with-fixes, C additionally "patch-immediately" due to Critical

Chunk strategies (which chunks?)

Strategy	When
Thematic (Cockpit, ML, Bot)	When commits are clearly grouped by feature themes
Chronological (per week, per sprint)	When no clear themes — fallback
Path-based (per subdir or module)	For very large monorepos
Critical-files-first (`security/`, `payments/`, Auth)	For stake differentiation

Heuristic: theme clusters are better than purely chronological chunks, because the reviewer subagent can then assess coherent logic (e.g. "this refactor + this test + this migration") instead of scattered commits.

Subagent prompt template per chunk

Each subagent receives:

Repo path (absolute path, because subagent has no conversation context)
Chunk description: what was built (1 paragraph)
Plan/requirements pointer: Daily-Note + spec files (subagent can read them)
SHA range: BASE_SHA + HEAD_SHA with example git diff commands
Review checklist from superpowers:requesting-code-review/code-reviewer.md:
- Plan alignment
- Code quality (separation-of-concerns, error-handling, type-safety, edge-cases)
- Architecture (security, scalability, integration)
- Testing (real-behavior vs mocks, edge-cases, integration)
- Production-readiness (migrations, backward-compat, docs)
Domain-specific lenses (e.g. "check display-name maxim", "check numeric-value verification")
Output format: Strengths + Issues by Severity + Recommendations + Assessment Verdict

Aggregation pattern

After parallel dispatch you have 4 structures. Aggregation:

Critical collection list across all chunks (sorted by risk impact)
Important collection list grouped by chunk
Minor collection list compact (1-liner per item)
Trust table per chunk with verdict
Top recommendation: which items IMMEDIATELY (Critical), which IN backlog (Important), which TRIVIAL (Minor)

Anti-patterns

❌ Single subagent for 200+ commits — output is generic
❌ Sequential subagent dispatches (40 min instead of 10) — no wallclock savings
❌ Chunks with overlapping SHA ranges — same commits get double-reviewed, findings duplicated
❌ Ignore tail-chunk without backlog entry — falls through the cracks tomorrow
❌ Don't aggregate findings, show 4 separate reports — user has to synthesize themselves, loses overview
❌ Subagent prompts without repo path (subagent has no conversation context, doesn't know where the code is)

Background: TDD progression (Bulletproofing log)

This section is history + design-decision rationale, NOT instruction for the executing caller. Caller follows the sections above (STOP, Pattern, Fallback).

Cycle 1 — Insight: Caller-Context-Mismatch

RED-S1 (general-purpose subagent, repo: production app, range: 200 commits / 231 files / +33,009 LoC, skill explicitly forbidden):

Subagent spontaneously chose risk-based triage (domain tiers HIGHEST/HIGH/MEDIUM/LOW)
Result: 4 Critical / 9 Important / 6 Minor with file:line precision
Wallclock: 334 s | Tokens: 213k | Tool-uses: 43
Including counter-thesis check per Critical, honest coverage-disclosure
Note at the end: "skill exists per CLAUDE.md, but was forbidden to me" → CLAUDE.md mention is a confounder, should be temporarily masked for clean tests

GREEN-S1 (general-purpose subagent, same scope, skill directive: "load & use"):

Subagent loaded the skill, but could NOT execute parallel dispatch (subagent has no Agent tool)
Fallback to forced sequential chunking (6 chunks A-F)
Result: 1 Critical (+re-verification of today's fixes) / 7 Important / 5 Minor
Wallclock: 566 s (+70%) | Tokens: 134k (-37%) | Tool-uses: 54
Meta-awareness bonus: recognized "today's commits are themselves code-review fixes, I'm reviewing re-reviews"
Output objectively WORSE than RED for bug discovery (1 vs 4 Critical) despite consuming more wallclock time

Skill design bug discovered: caller-context-mismatch. Skill implicitly presumes the caller has Agent tool. With subagent-caller without Agent tool, chunking is pure overhead and blocks natural risk-based triage.

Refactor applied (R1+R2+R3):

R1 (caller-context guard): new STOP section at the very top, subagent check before everything else
R2 (Fallback mode): new section "Sequential-Triage-Mode" with risk-based-triage template for non-dispatch callers
R3 (description filter): description extended with "AND caller has Agent/Task-tool" + Do-NOT-load for subagent-caller (removes at the same time the description-trap of the workflow-summary)

Cycle 2 — Mini-Verify-Fallback pass + S3 Pattern-Mode value-prop confirmed

Mini-Verify RED (general-purpose subagent, 35 commits, skill directive "load & use"):

Subagent read STOP section FIRST, recognized missing Agent tool, jumped to fallback mode (explicitly documented in skill self-reflection)
Risk-based triage applied, 2 Critical (real bugs) / 7 Important / 7 Minor in 380s / 255k tokens
Coverage disclosure as own section, counter-thesis check per Critical
R1-guard works. R2-fallback-mode works. R3-description filters correctly.
Subagent returned 4 constructive polish hints (incorporated: sample definition, counter-thesis depth hierarchy, tabular trust-verdict in fallback, TDD progression as Background label)

S3 Pattern-Mode value-prop test (top-level caller with Agent tool, 80 commits, 5 chunks A-E dispatched in parallel):

Wallclock: 488s parallel vs ~1500s sequential-estimated = 3x speedup
Tokens: 581k total (5 chunks). Per finding: 10.4k vs 16k single-subagent = 38% efficiency win
Findings: 10 Critical / 20 Important / 27 Minor
- of which 6 truly-NEW Critical (4 IMMEDIATE items in production app)
- 4 Critical confirmed-already-fixed via cross-chunk triangulation (Chunk C + E confirm each other)
Coverage depth: each chunk read touched files FULLY (Chunk A: 632 LoC v3_live_monitor.py + 456 LoC tests)
Trust verdicts per chunk: A=Acceptable-with-fixes, B=Acceptable-with-fixes, C=Confirmed-fixed-in-follow-ups, D=Acceptable-with-C1-fix, E=HIGH-Trust for Critical fixes

Test limitation documented: S2 (Grayzone threshold loophole) + S3-Time-Pressure-Loophole-Test are not cleanly executable with the current test environment — subagents have no Agent tool (Pattern Mode not executable), top-level caller (main Claude) is biased in the test setup by skill-design knowledge. Pending for future tests in "naive" sessions without skill-design context.

Status: PROMOTED. Skill is auto-discoverable and productively usable.

Cross-references

superpowers:requesting-code-review — base skill for single review (chunk-dispatch is its multi-skill counterpart)
superpowers:dispatching-parallel-agents — pattern base for parallel subagent dispatch
core-memories.md "Code review must become standard" — meta-maxim that demands regular reviews; chunk-dispatch is the remedy when the maxim has been broken for a while
post-session-skill-review — what to do after a successful chunk review (extract skill candidates from findings)

Real-world impact

User push-back at 09:30: "complete code-review, hasn't been done for weeks."

Scope measurement: 246 commits / 287 files / +44k LoC
Single-subagent attempt would fail (context + generic output)
Chunk strategy: 4 chunks (A-D) parallel, ~10 min wallclock
Output: 4 detailed reports with 3 Critical (live production), 11 Important, various Minor
Follow-up session: 8 of 9 follow-up tasks completed, all Critical bugs eliminated, all user-visible bugs fixed
Time-to-value: without chunk strategy this would not have been doable in one session, Critical bugs would have stayed under the radar