solidity-function-audit-team - SKILL.md Agent Skill

name: solidity-function-audit-team description: Agent team variant of solidity-function-audit with human-in-the-loop review. Uses agent teams for inter-agent messaging, shared task list with dependencies, plus interactive design decision capture (Stage 0), findings review (Stage 4), and dispute re-evaluation (Stage 5). Requires CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1. disable-model-invocation: true

Function Audit (Agent Team)

Purpose

Perform a comprehensive per-function audit using an agent team with human-in-the-loop review. Stage 0 captures design decisions interactively. Stages 1-3 use agent teams with inter-agent messaging and shared task list. Stage 4 presents findings for developer classification. Stage 5 re-evaluates disputed findings. The lead handles pre-flight, Stage 0, synthesis, Stage 4, and Stage 5 directly — only Stages 1-3 are delegated to teammates.

Prerequisites

CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 must be set in .claude/settings.json env or shell environment

Pre-Flight Discovery (Lead Only)

0. Check for Previous Run

Check if docs/audit/function-audit/ exists. If it does, ask the user:

Archive: Rename to docs/audit/function-audit-{YYYY-MM-DD-HHMMSS}/ (using current timestamp) and proceed fresh
Overwrite: Delete contents and proceed
Cancel: Stop execution

If the directory does not exist, proceed normally.

1. Identify Project Path

Use $ARGUMENTS as the project path if provided, otherwise use the current working directory.
Store as PROJECT_PATH for all subsequent steps.

2. Discover Contracts (lean — Grep only)

Use Glob for src/**/*.sol (excluding src/artifacts/) to find all source files. Then use Grep for contract \w+ and library \w+ to identify contract and library declarations. Do NOT Read entire source files — only Read a specific file when domain grouping is ambiguous (e.g., a function straddles two contracts). The goal is to know file paths + contract names, not to understand the code.

3. Discover Functions (lean — Grep only)

Use Grep for function \w+\( in each discovered .sol file to find all function declarations. Use Grep context flags (-A 1 or -B 1) to determine visibility from the surrounding lines — do NOT Read full files:

Collect all external and public functions
Also collect internal functions (they are often where the real logic lives)
Skip auto-generated getters and pure view helpers that just return a constant

4. Group Into Domains

Group functions into logical domains using these heuristics (in priority order):

Shared modifiers: Functions sharing the same access control modifier belong together
Shared state writes: Functions that write to the same state variables belong together
Lifecycle stages: Functions that form a sequence (request -> process -> claim) belong together
Name prefixes: Functions with common prefixes (deposit/withdraw, add/remove, register/deregister)

Target 4-10 domains of 3-15 functions each. If the contract has fewer than 15 functions total, use a single domain. If natural grouping exceeds 10 domains, merge the smallest related domains.

5. Create Output Directory

mkdir -p docs/audit/function-audit/{stage0,stage1,stage2,stage3,verification,review}

6. Preview Domains

Display to the user:

Number of contracts found
Number of functions found
Domain groupings with function lists
Proceed to Stage 0 for design decision capture before confirming

7. Collect Source File Paths

Build the list of all .sol source file paths (absolute paths) that teammates will need to read. Format as one absolute path per line when substituting into {source_file_list} placeholders in teammate prompts.

8. Detect Project Characteristics

Scan source files for DeFi-relevant patterns to condition Stage 2/3 prompts:

Token interfaces: Grep for ERC20, ERC721, ERC1155, ERC4626, IERC20, SafeERC20 → set {has_tokens} true/false
Proxy/upgrade patterns: Grep for UUPSUpgradeable, TransparentProxy, Initializable → set {has_proxies} true/false
Oracle imports: Grep for AggregatorV3Interface, IOracle, TWAP → set {has_oracles} true/false

Session State Checkpoint

After each completed stage, write {output_root}/stage-checkpoint.md using the Write tool (full overwrite). Include:

PROJECT_PATH, OUTPUT_ROOT
STAGE_STATUS: key=value pairs for each stage (e.g., preflight=complete stage0=complete stage1=pending). Write as a standalone line starting with STAGE_STATUS: — this line is machine-parsed by the PreCompact hook.
DOMAINS: one line per domain with slug, name, and function list
FLAGS: has_tokens, has_proxies, has_oracles
PATHS: design_decisions_file, slither_file, and all stage output file paths known so far
After Synthesis: FINDING_TOTALS with severity counts
After Stage 4: REVIEW_RESPONSES_FILE and DISPUTED_COUNT
TEAM_NAME: function-audit

Before each stage, read the checkpoint file to confirm all paths and domain groupings. If state has been lost (e.g., after auto-compaction), recover via:

Glob(pattern: "**/docs/audit/function-audit/stage-checkpoint.md")
Read the file to restore all session state
Resume from the last completed stage

Stage 0: Design Decisions (Lead Only — interactive)

Capture developer intent before the automated audit. Read resources/REVIEW_PROMPTS.md (Stage 0 section) for extraction patterns, confirmation script, and output format. Execute the three phases:

Automated extraction: Grep source files for NatSpec @dev comments, static analysis annotations, intent keywords, and code-level patterns (rounding, access control, upgrades, reentrancy, pausability)
Interactive confirmation: Present detections grouped by category, ask user to confirm/correct/add context per category
Write output: Write docs/audit/function-audit/stage0/design-decisions.md. Store path as {design_decisions_file} for Stage 2 and Stage 3 agent prompts.

8. Confirm with User

Display to the user:

Number of contracts found, functions found, domain groupings
Design decisions summary (categories and counts)
Ask for confirmation before proceeding to Stage 1

Slither Integration (Lead Only — between Stage 0 and Stage 1)

Run Slither static analysis if available. This is NOT a teammate — the lead does this directly.

Run which slither via Bash
If not found → display "Slither not detected. Install with pip install slither-analyzer for automated static analysis. Continuing without it." → set {slither_file} to empty → proceed
If found → run slither . --json /tmp/slither-output.json --exclude-informational --filter-paths "test|script|lib|node_modules" 2>&1 || true
Check if /tmp/slither-output.json exists and is non-empty (use Bash: test -s /tmp/slither-output.json)
If the file doesn't exist or is empty → display "Slither failed to analyze the project (likely a compilation or solc version issue). Continuing without it." → set {slither_file} to empty → proceed to Stage 1
If the file exists → Read it, map findings (High→HIGH, Medium→MEDIUM, Low→LOW, Informational→INFO), write to docs/audit/function-audit/stage0/slither-findings.md
Display summary: "Slither found N findings (H high, M medium, L low, I info)"
Store path as {slither_file} for teammate prompts

Write the initial session state checkpoint. Inform the user: "Checkpoint saved. You may run /compact preserve audit stage status, domain groupings, and file paths to free context before agent launch, or say proceed."

9. Plan Task IDs and Dependency Graph

Before creating tasks, plan out all task IDs and their dependencies:

Stage 1 (no deps — start immediately):
  T1: State Variable Map
  T2: Access Control Map
  T3: External Call Map

Stage 2 (blocked by all Stage 1 tasks):
  T4: Domain "{domain_1_name}"      | blockedBy: [T1, T2, T3]
  T5: Domain "{domain_2_name}"      | blockedBy: [T1, T2, T3]
  ...
  T(3+N): Domain "{domain_N_name}"  | blockedBy: [T1, T2, T3]

Stage 3 (blocked by all Stage 2 tasks):
  T(4+N): State Consistency         | blockedBy: [T4 .. T(3+N)]
  T(5+N): Math & Rounding           | blockedBy: [T4 .. T(3+N)]
  T(6+N): Reentrancy & Trust        | blockedBy: [T4 .. T(3+N)]
  T(7+N): Adversarial Sequences     | blockedBy: [T4 .. T(3+N)]

Team Creation (Lead Only — use exact tools specified)

Step 1: Create the team using TeamCreate tool

Call the TeamCreate tool:

TeamCreate(team_name: "function-audit", description: "Function audit for {PROJECT_PATH}")

This creates the shared task list and team config at ~/.claude/teams/function-audit/.

Step 2: Create ALL tasks using TaskCreate tool

Create every task upfront BEFORE spawning any teammates. Read the prompt templates from resources/STAGE_PROMPTS.md and fill in all placeholders. For Stage 2 and Stage 3 tasks, fill in {design_decisions_file} (absolute path to stage0/design-decisions.md) and {slither_file} (absolute path to stage0/slither-findings.md, or empty string if Slither was not run). Stage 1 prompts do not use these placeholders. Each task's description field must contain the FULL analysis prompt (the filled-in template from STAGE_PROMPTS.md), not a summary. Build task descriptions mechanically from templates — substitute placeholders without analyzing the code content. The task description is for the teammate, not the lead.

Stage 1 tasks (no dependencies — created first):

Call TaskCreate 3 times:

T1: TaskCreate(subject: "State Variable Map", description: "<filled Stage 1a prompt>", activeForm: "Mapping state variables")
T2: TaskCreate(subject: "Access Control Map", description: "<filled Stage 1b prompt>", activeForm: "Mapping access control")
T3: TaskCreate(subject: "External Call Map", description: "<filled Stage 1c prompt>", activeForm: "Mapping external calls")

For {teammate_roles} in Stage 1 prompts, substitute:

- state-vars: State Variable Map — analyzing all storage variables, their readers/writers, and invariants
- access-ctrl: Access Control Map — analyzing all access control modifiers, roles, and gaps
- ext-calls: External Call Map — analyzing all external calls, reentrancy risks, and trust levels

Stage 2 tasks (one per domain, blocked by Stage 1):

For each domain, call TaskCreate:

TaskCreate(subject: "Domain: {domain_name}", description: "<filled Stage 2 prompt>", activeForm: "Analyzing {domain_name} domain")

Then call TaskUpdate to set dependencies:

TaskUpdate(taskId: "{stage2_task_id}", addBlockedBy: ["{T1_id}", "{T2_id}", "{T3_id}"])

For {source_file_list} in Stage 2 prompts, scope to domain-relevant files only: include files containing the domain's functions, files containing contracts called by those functions (from Stage 1c external call map), and files containing inherited contracts or imported libraries. Do NOT include all project source files.

For {teammate_roles} in Stage 2 prompts, list ALL teammates across all stages so domain analysts can message anyone:

Stage 1 (completed by now — reference their output files):
- state-vars: State Variable Map → {stage1_state_var_file}
- access-ctrl: Access Control Map → {stage1_access_control_file}
- ext-calls: External Call Map → {stage1_external_call_file}

Stage 2 (your peers — message them about cross-domain findings):
- domain-{slug1}: {domain_1_name} — functions: {function_list_summary_1}
- domain-{slug2}: {domain_2_name} — functions: {function_list_summary_2}
- ... (all domains)

Stage 3 (not started yet):
- state-consistency: State Consistency Audit
- math-rounding: Math & Rounding Audit
- reentrancy-trust: Reentrancy & Trust Boundaries Audit
- adversarial-sequences: Adversarial Sequence Modeling Audit

Stage 3 tasks (blocked by ALL Stage 2 tasks):

Call TaskCreate 4 times:

TaskCreate(subject: "State Consistency Audit", description: "<filled Stage 3a prompt>", activeForm: "Auditing state consistency")
TaskCreate(subject: "Math & Rounding Audit", description: "<filled Stage 3b prompt>", activeForm: "Auditing math and rounding")
TaskCreate(subject: "Reentrancy & Trust Audit", description: "<filled Stage 3c prompt>", activeForm: "Auditing reentrancy and trust")
TaskCreate(subject: "Adversarial Sequence Modeling Audit", description: "<filled Stage 3d prompt>", activeForm: "Modeling adversarial sequences")

Then call TaskUpdate to set dependencies on ALL Stage 2 task IDs:

TaskUpdate(taskId: "{stage3_task_id}", addBlockedBy: ["{T4_id}", "{T5_id}", ..., "{T3+N_id}"])

For {teammate_roles} in Stage 3 prompts:

- state-consistency: State Consistency — analyzing accounting invariants, divergent tracking, stale state, transition completeness
- math-rounding: Math & Rounding — analyzing overflow, rounding direction, precision loss, exchange rate manipulation, fee arithmetic
- reentrancy-trust: Reentrancy & Trust — analyzing CEI compliance, delegatecall safety, trust boundaries, external dependencies, callback vectors
- adversarial-sequences: Adversarial Sequences — modeling cross-contract attacker flows, multi-transaction exploit ordering, state delta traces, and violated invariants

Step 3: Spawn teammates using Task tool with team_name

After ALL tasks are created, spawn teammates using the Task tool with team_name and name parameters. Each teammate is a separate Claude Code session.

Each teammate's prompt parameter includes a role-specific opening followed by shared operational instructions. The role opening gives the agent identity and specialty context.

Shared operational block (appended to every role opening):

## How You Work
1. Check the task list (TaskList) for available tasks (status: pending, no owner, no unresolved blockedBy)
2. Claim a task by calling TaskUpdate(taskId: "...", owner: "<your name>", status: "in_progress")
3. Read the task description (TaskGet) for your full analysis prompt
4. Execute the analysis: read source files, write findings to the output file
5. Mark the task as completed: TaskUpdate(taskId: "...", status: "completed")
6. Call TaskList again to check for more available tasks — claim the next one if available

## Communication
- Use SendMessage(type: "message", recipient: "<teammate-name>", content: "...", summary: "...") to message other teammates
- Follow the Communication Guidelines in each task description
- When you receive a message, incorporate the finding into your analysis if relevant

## Important Rules
- Write ALL analysis to the output file specified in the task using the Write tool. Do NOT return analysis in messages.
- Always use absolute paths when reading files.
- Mark tasks completed only after writing the output file.

Stage 1 teammates (3) — each with a specialized role opening:

Task(subagent_type: "general-purpose", name: "state-vars", team_name: "function-audit", prompt: "You are **state-vars**, a Solidity security auditor specializing in **state variable analysis** — mapping storage variables, readers/writers, invariants, and duplication risks.\n\n<shared operational block>", mode: "bypassPermissions", max_turns: 15)
Task(subagent_type: "general-purpose", name: "access-ctrl", team_name: "function-audit", prompt: "You are **access-ctrl**, a Solidity security auditor specializing in **access control analysis** — mapping roles, modifiers, privilege levels, and authorization gaps.\n\n<shared operational block>", mode: "bypassPermissions", max_turns: 15)
Task(subagent_type: "general-purpose", name: "ext-calls", team_name: "function-audit", prompt: "You are **ext-calls**, a Solidity security auditor specializing in **external call analysis** — mapping cross-contract calls, reentrancy vectors, and trust boundaries.\n\n<shared operational block>", mode: "bypassPermissions", max_turns: 15)

Stage 2 teammates (one per domain):

Task(subagent_type: "general-purpose", name: "domain-{slug}", team_name: "function-audit", prompt: "You are **domain-{slug}**, a Solidity security auditor specializing in the **{domain_name}** domain — per-function audit of {N} functions covering {brief description}.\n\n<shared operational block>", mode: "bypassPermissions", max_turns: 25)

Note: Stage 2 teammates send their analysis plan as a message to the lead before executing (see STAGE_PROMPTS.md). This is informational — teammates proceed without waiting for approval.

Stage 3 teammates (4):

Task(subagent_type: "general-purpose", name: "state-consistency", team_name: "function-audit", prompt: "You are **state-consistency**, a Solidity security auditor specializing in **cross-domain state consistency** — accounting invariants, divergent tracking, stale state, transition completeness.\n\n<shared operational block>", mode: "bypassPermissions", max_turns: 25)
Task(subagent_type: "general-purpose", name: "math-rounding", team_name: "function-audit", prompt: "You are **math-rounding**, a Solidity security auditor specializing in **mathematical analysis** — overflow, rounding, precision loss, exchange rate manipulation, fee arithmetic.\n\n<shared operational block>", mode: "bypassPermissions", max_turns: 25)
Task(subagent_type: "general-purpose", name: "reentrancy-trust", team_name: "function-audit", prompt: "You are **reentrancy-trust**, a Solidity security auditor specializing in **reentrancy and trust boundary analysis** — CEI compliance, delegatecall safety, trust boundaries, callback vectors.\n\n<shared operational block>", mode: "bypassPermissions", max_turns: 25)
Task(subagent_type: "general-purpose", name: "adversarial-sequences", team_name: "function-audit", prompt: "You are **adversarial-sequences**, a Solidity security auditor specializing in **end-to-end exploit sequencing** — cross-contract attacker paths, multi-transaction ordering, and invariant-breaking state transition traces.\n\n<shared operational block>", mode: "bypassPermissions", max_turns: 25)

Spawn ALL teammates at once (all stages). Teammates blocked by dependencies will wait automatically — the task list handles stage ordering.

Delegation (Lead Only)

After spawning all teammates:

The lead MUST NOT do any analysis work itself — only coordinate
Teammates self-claim tasks from the shared list as they become unblocked
Monitor progress via TaskList tool
Stage 2 plan messages: Stage 2 teammates send their analysis plan as a message before executing. Review these for awareness — if a plan has obvious gaps (missing functions, wrong focus areas), message the teammate with corrections. Teammates do not wait for approval, so respond promptly if you see issues.
When ALL tasks show status "completed" in TaskList, quick-validate all output files before proceeding:
- Use Glob to verify files exist in stage1/*.md, stage2/*.md, and stage3/*.md
- Read the first 5 and last 5 lines of each file. Verify each is non-empty and contains ## headings
- For Stage 2 files: verify they contain ## Summary of Findings or ## Cross-Cutting Analysis and at least one severity tag (**CRITICAL -- , **HIGH -- , **MEDIUM -- , **LOW -- , or **INFO -- )
- For Stage 3 files: verify they contain at least one severity tag
- If validation fails for any file, note it as INCOMPLETE in synthesis
Update the session state checkpoint (Stages 1-3 complete, add all stage1/stage2/stage3 file paths).
Proceed to Synthesis
Send shutdown requests to all teammates: SendMessage(type: "shutdown_request", recipient: "<name>", content: "All tasks complete")
After all teammates shut down, call TeamDelete to clean up

Concurrency

Stage 1: 3 teammates active simultaneously (tasks have no blockedBy)
Stage 2: All N domain teammates active simultaneously (tasks unblock together when Stage 1 completes)
Stage 3: 4 teammates active simultaneously (tasks unblock together when Stage 2 completes)

Synthesis (Lead Only)

After all tasks are completed, the lead performs synthesis directly (not as a teammate task — the lead's context is clean from delegate mode).

1. Gather Findings via Grep (do NOT read full files)

Use Grep to extract all findings and verdicts in one pass — do NOT read each output file in full:

Grep(pattern: "\\*\\*(CRITICAL|HIGH|MEDIUM|LOW|INFO) -- ", path: "docs/audit/function-audit/", output_mode: "content") — returns all findings with file paths and line numbers
Grep(pattern: "\\*\\*Verdict\\*\\*: \\*\\*", path: "docs/audit/function-audit/", output_mode: "content") — returns all verdicts

2. Tally Findings from Grep Results

From the Grep output, count findings per severity per file:

Count **CRITICAL --, **HIGH --, **MEDIUM --, **LOW --, **INFO -- matches per file

Count per-function verdicts:

Count **Verdict**: **SOUND**, **Verdict**: **NEEDS_REVIEW**, **Verdict**: **ISSUE_FOUND** matches

Do NOT count domain-level overall verdicts in the per-function tally — track those separately.

3. Extract Finding Details for Master Table

From the Grep results, extract each finding's severity, title, source file path, and location. Only Read individual files briefly (first/last 10 lines) for executive summary context. Build task descriptions mechanically from templates — substitute placeholders without analyzing the code content.

4. Write INDEX.md

Write docs/audit/function-audit/INDEX.md with sections for Stage 0, Stage 1, Stage 2 (per-domain with verdicts), Stage 3, an "All Findings" master table (sorted by severity: CRITICAL first), and Totals. Include **Method**: Agent Team (inter-agent communication enabled) in the header. Link every output file with finding counts using 5-level format ({C}C / {H}H / {M}M / {L}L / {I}I).

The "All Findings" table format:

## All Findings

| # | Severity | Finding | Location | Source File |
|---|----------|---------|----------|-------------|
| 1 | CRITICAL | {title} | `Contract.function()` L{line} | [domain-{slug}.md](stage2/domain-{slug}.md) |
| ... | ... | ... | ... | ... |

5. Write SUMMARY.md

Write docs/audit/function-audit/SUMMARY.md containing:

Executive summary (2-3 paragraphs)
Top CRITICAL and HIGH findings (if any) with file links
Notable MEDIUM findings with file links
Cross-cutting themes observed
Inter-agent findings (findings that emerged from teammate communication)
Recommended action items (prioritized)

6. Report to User

Display finding stats and ask if the user wants to proceed to human review:

Total files generated
Finding breakdown by severity (5 levels)
Verdict breakdown
Any CRITICAL or HIGH findings highlighted

Update the session state checkpoint (Synthesis complete, add finding tallies). Inform the user: "Checkpoint updated. You may run /compact preserve audit stage status, domain groupings, and finding tallies to free context before interactive review, or proceed directly."

"Proceed to findings review? [yes/no]"

If the user declines, output links to INDEX.md and SUMMARY.md and stop.

Verification (sequential background agents — between Synthesis and Stage 4)

Using the "All Findings" master table from INDEX.md, collect all CRITICAL, HIGH, and MEDIUM findings.

Ask the user: "Run automated verification? This spawns one sequential agent per CRITICAL/HIGH finding (Foundry test each) plus one batch agent for MEDIUMs (anti-pattern check only). Estimated: {N} sequential agents + 1 batch. [yes/skip]"

If skip → proceed directly to Stage 4.

Setup

mkdir -p docs/audit/function-audit/verification
mkdir -p test/audit-verification

Per-Finding Agents (CRITICAL and HIGH — sequential)

For each CRITICAL/HIGH finding (CRITICALs first, then HIGHs):

Read the per-finding prompt from resources/VERIFICATION_PROMPTS.md and fill in placeholders:
- {finding_number} — zero-padded 3-digit from master table (e.g., 001)
- {finding_severity}, {finding_title}, {finding_source_file}, {finding_function}
- {output_file} — docs/audit/function-audit/verification/finding-{NNN}.md
- {test_dir} — absolute path to {PROJECT_PATH}/test/audit-verification/
- {test_name} — AuditVerify_{NNN}_{ContractName} (PascalCase)
- {source_file_list}, {stage1_file_list}, {design_decisions_file}
Launch ONE Task agent: subagent_type: "solidity-function-audit-team:solidity-verifier", max_turns: 20
Wait: TaskOutput(block: true, timeout: 480000) — do NOT launch next agent until complete
Quick-validate: verify finding-{NNN}.md is non-empty and contains one of [CONFIRMED], [REFUTED], [LIKELY-FP], [INCONCLUSIVE]

MEDIUM Batch Agent

If MEDIUM findings exist:

Read the batch prompt from resources/VERIFICATION_PROMPTS.md and fill in {medium_findings_list} (number, title, source file, function, full finding text for each), {output_file} (verification/medium-findings.md), {source_file_list}, {stage1_file_list}, {design_decisions_file}
Launch ONE Task agent: subagent_type: "solidity-function-audit-team:solidity-verifier", max_turns: 25
Wait: TaskOutput(block: true, timeout: 600000)
Quick-validate: verify medium-findings.md is non-empty and contains ## heading

Write Verification Summary

Orchestrator writes docs/audit/function-audit/verification/verification-summary.md with:

Results table: | # | Severity | Finding | Verdict | Test File |
Verdict counts: CONFIRMED: N | REFUTED: N | LIKELY-FP: N | INCONCLUSIVE: N

Update INDEX.md: add "Verification" section with file links + add "Verified" column to "All Findings" table. Update SUMMARY.md: add "Verification" section with verdict counts and notable confirmed issues. Update session state checkpoint (Verification complete, add verdict tallies).

Report: "Verification complete. {N} CONFIRMED, {N} REFUTED, {N} LIKELY-FP, {N} INCONCLUSIVE. Proceed to findings review? [yes/no]"

Stage 4: Human Review (Lead Only — interactive)

Read the review flow from resources/REVIEW_PROMPTS.md (Stage 4 section). Execute interactively:

Extract findings: Parse all stage2/ and stage3/ files for **CRITICAL -- , **HIGH -- , **MEDIUM -- , **LOW -- , **INFO -- patterns. Note DESIGN_DECISION -- tagged findings separately.
Present CRITICALs (if any): Numbered list, ask user to classify each: BUG, DESIGN, DISPUTED, or DISCUSS.
Present HIGHs (if any): Same format.
Present MEDIUMs (if any): Same format.
Present LOWs and INFOs: Summary count, opt-in to review.
Follow-up on DISPUTED/DISCUSS: Show full finding + source code, record developer reasoning.
Write output: docs/audit/function-audit/review/review-responses.md

Update the session state checkpoint (Stage 4 complete, add review file path and disputed count).

Stage 5: Re-Evaluation (Lead Only — conditional, 1 agent)

Skip this stage if no DISPUTED or DISCUSS items exist from Stage 4.

If items exist: read the Stage 5 agent prompt from resources/REVIEW_PROMPTS.md, fill in placeholders ({output_file}, {design_decisions_file}, {review_responses_file}, {source_file_list}, {disputed_findings}), and launch ONE Task agent with run_in_background: true, subagent_type: "general-purpose", mode: "bypassPermissions", max_turns: 15. Wait with TaskOutput(block: true, timeout: 600000). Quick-validate the re-evaluation output: verify review/re-evaluation.md is non-empty and contains at least one ## heading. If validation fails, report the issue to the user.

Final Synthesis Update

Update SUMMARY.md with a "Human Review" section (classification table + before/after status). Update INDEX.md with review file links. Display final summary to the user.

Guardrails

All teammates write to files — never return analysis in responses. Returns fill the main context window and cause overflow in later stages.
Use blockedBy dependencies — Stage 2 needs Stage 1 output, Stage 3 needs Stage 2 output. Only teammates within each stage run in parallel.
Always complete Stage 3 — cross-cutting issues emerge from combining individually-sound functions, even when no CRITICALs are found in Stage 2.
Include communication guidelines — Stage 2 agents miss cross-domain issues that only emerge when domains communicate in real-time. Every task must have communication guidelines.
Lead runs synthesis directly — lead's context is lean from delegate mode; synthesis benefits from this clean context.
Stage 0 is best-effort — if no design signals are detected, write an empty design-decisions.md and proceed. Agents evaluate independently.
Stage 4 requires patience — present findings in severity batches. Let the user classify at their own pace. Never auto-classify.
Stage 5 is conditional — only runs if DISPUTED or DISCUSS items exist. Do not spawn the agent otherwise.
Verification is sequential — never run more than one verification agent at a time (forge compilation lock conflicts).

Notes

Agent teams: Requires CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in environment.
Teammate count: Spawn at least N teammates where N = max(3, number of domains). Stage 1 uses 3, Stage 2 uses N, Stage 3 uses 4. Teammates self-schedule via the task list.
Stage 2 planning: Stage 2 teammates design and share their analysis plan via messaging before executing. This is prompt-enforced (not permission-enforced) to avoid deadlocks with the plan approval protocol.
File paths: Always use absolute paths in task descriptions so teammates can Read files without ambiguity.
Error handling: If a teammate fails or a task gets stuck, the lead should investigate via TaskList and reassign if needed. In synthesis, note missing files in INDEX.md with status INCOMPLETE — agent failed and proceed using available outputs.
Previous runs: Step 0 of Pre-Flight checks for existing output and offers archive, overwrite, or cancel options.
Verification tests: Written to test/audit-verification/ in the project root. Tests are persistent artifacts the developer can re-run.
Long sessions: For large projects, set CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=70 in settings for earlier, higher-quality compaction summaries.