lodestar-review - SKILL.md Agent Skill

name: lodestar-review description: Run multi-persona code reviews on Lodestar PRs. Use when reviewing a PR, diff, or code change in ChainSafe/lodestar. Spawns specialized reviewer agents (bug hunter, security engineer, architect, etc.) with Lodestar-specific context, collects findings, and synthesizes a consolidated report. Covers PR review, code quality assessment, and security analysis for Ethereum consensus client code.

Lodestar Code Review

Multi-persona review system for ChainSafe/lodestar PRs. Each reviewer has a narrow scope with explicit rejection criteria, enhanced with Lodestar-specific knowledge.

Reviewers

Agent ID (use as `agentId`!)	Role	Model	Focus
`review-bugs`	Bug Hunter	GPT-5.3-Codex	Functional errors, logic flaws, off-by-one. Only ACTUAL broken behavior.
`review-defender`	Defender	GPT-5.3-Codex	Malicious code, backdoors, supply chain threats.
`review-linter`	Style Enforcer	Gemini 2.5 Pro	Style consistency vs Lodestar conventions.
`review-security`	Security Engineer	GPT-5.3-Codex	DoS vectors, peer manipulation, validation bypasses, crypto misuse.
`review-wisdom`	Wise Senior	Claude Opus 4.6	Clean code principles, maintainability, readability.
`reviewer-architect`	Architect	GPT-5.3-Codex (thinking: xhigh)	Package boundaries, consensus spec alignment, module coupling.
`review-devils-advocate`	Devil's Advocate	Claude Opus 4.6 (thinking: high)	Challenges premise, simpler alternatives, spec interpretation, necessity.

⚠️ Always pass the Agent ID as agentId in sessions_spawn. Omitting it routes to your default model instead of the reviewer's configured model. This is a mandatory field, not optional.

Reviewer Selection

Pick reviewers based on PR type:

PR Type	Reviewers
Any PR	`review-bugs` (always)
>50 lines changed	+ `review-wisdom`
Feature / new functionality	+ `reviewer-architect`, + `review-devils-advocate`
Consensus / spec-related	+ `review-devils-advocate`
Networking / API / p2p changes	+ `review-security`
External contributor	+ `review-defender`
Style-heavy / refactor	+ `review-linter`

Cost is not a concern — use all relevant reviewers.

Workflow

1. Get the diff

For open PRs:

gh pr diff <PR_NUMBER> --repo ChainSafe/lodestar

For local changes (pre-PR, dev workflow Phase 4):

cd ~/lodestar-<feature>
git diff unstable...HEAD    # diff against base branch
# or for staged changes:
git diff --cached

For large diffs (>3000 lines), focus on the most critical files or split into chunks.

1.5 Get the changed-file list (mandatory — prevents false positives)

Before spawning any reviewer, capture the exact set of files changed in this PR/branch. This list is used:

In the reviewer task prompt (so reviewers know exact scope)
After reviews complete (to discard findings about files NOT in this list)

# For open PRs:
gh pr diff <PR_NUMBER> --repo ChainSafe/lodestar --name-only

# For local changes:
cd ~/lodestar-<feature>
git diff --name-only origin/unstable...HEAD

Save the output as CHANGED_FILES. Add it to each reviewer task as:

## Files Changed in This PR
<CHANGED_FILES list, one per line>

IMPORTANT: Only flag issues in the files listed above. Do NOT comment on files not in this list, even if they appear in the broader codebase context.

⚠️ FALSE POSITIVE GUARD (mandatory post-review step): Before acting on any finding, verify the flagged file is in CHANGED_FILES. Discard any finding that references a file not in this list — reviewers sometimes hallucinate scope from broader context. This has burned real time (PR #8993: two reviewers flagged dataColumns.ts and gloas.ts, neither was in the diff).

2. Read persona prompts

Each reviewer's persona is in references/<agent-id>.md (relative to this skill directory). Read the persona file before spawning.

3. Spawn reviewers

For each selected reviewer, spawn with the persona prepended to the diff. Always include the ## Files Changed block (from Step 1.5) between the persona and the diff — this is the primary mechanism that prevents false positives:

sessions_spawn(
  agentId: "<agent-id>",   // ⚠️ MANDATORY — always pass agentId to route to the correct model!
  task: "<persona prompt>\n\n---\n\n## Files Changed in This PR\n<CHANGED_FILES, one per line>\n\nIMPORTANT: Only flag issues in the files listed above. Do NOT comment on files not in this list, even if they appear in the broader codebase context.\n\n---\n\nReview this diff for ChainSafe/lodestar PR #<number> (<title>):\n\n```diff\n<diff>\n```",
  label: "pr<number>-<reviewer-short-name>",
  runTimeoutSeconds: <timeout>   // Scale by diff size (see below)
)

Spawn all selected reviewers in parallel (no dependencies between them).

Durable-output requirement (mandatory): include this block at the end of every reviewer task so findings survive announce/session transport issues:

Context metadata for this run:
- Reviewer: <agent-id>
- Reviewed commit: <HEAD_SHA>

After finishing the review:
1) Write your full findings markdown to `~/.openclaw/workspace/notes/review-reports/pr-<PR>-<agent-id>.md`.
   - Preferred helper (avoids marker drift):
     `bash ~/.openclaw/workspace/scripts/review/write-review-artifact.sh --pr <PR> --agent <agent-id> --head-repo /absolute/path/to/repo <<'EOF'\n<findings markdown>\nEOF`
   - Include the exact metadata line `Reviewer: <agent-id>` near the top of the artifact.
   - Include the exact metadata line `Reviewed commit: <HEAD_SHA>` near the top of the artifact.
   - If there are no findings, write a short "No findings" report anyway.
2) In your final chat response, include the exact file path you wrote.

(For pre-PR local review, replace <PR> with a stable branch/task slug, e.g. local-<branch>.)

⚠️ agentId is MANDATORY for every reviewer spawn. Without it, the subagent inherits the parent's default model instead of the reviewer's configured model. This caused bugs+security reviewers to run on Claude Opus instead of GPT-5.3-Codex, leading to timeouts on PR #8962 (105KB diff).

⚠️ Timeout scaling by diff size:

Diff size runTimeoutSeconds

< 30KB (~800 lines) 180

30–80KB (~800–2000 lines) 300

> 80KB (~2000+ lines) 420

The default 300s is too short for large diffs. PR #8962 (105KB) timed out 2 reviewers at 300s; retries succeeded at 420s (bugs: 6m7s, security: 4m12s).

Diff size	`runTimeoutSeconds`
< 30KB (~800 lines)	180
30–80KB (~800–2000 lines)	300
> 80KB (~2000+ lines)	420

Note: For reviewer-architect, always pass thinking: "xhigh" in the spawn call for deep architectural reasoning.

4. Wait for results

All spawned reviewers will announce their findings back to the main session. Wait for ALL to complete before synthesizing.

4.1 Transport-failure fallback (mandatory)

If a reviewer announces completion but the actual findings are missing/truncated, do not proceed with partial context.

Read the reviewer artifact directly from notes/review-reports/pr-<PR>-<agent-id>.md (from Step 3 durable-output requirement).
If the file is missing for any reviewer, re-run only that reviewer.
Only synthesize once every selected reviewer has either:
- a complete announce payload, or
- a complete on-disk artifact file.

Quick verifier command (recommended before synthesis):

bash ~/.openclaw/workspace/scripts/review/check-review-artifacts.sh \
  --pr <PR> \
  --agents <agent-id-1> <agent-id-2> ... \
  --allow-empty-no-findings \
  --max-age-minutes 180 \
  --require-agent-marker \
  --require-reviewed-head \
  --head-repo /absolute/path/to/repo

Exit 0: every expected reviewer artifact exists and is usable.
Exit 2: at least one expected artifact is missing/invalid/stale — re-run only the missing reviewer(s), then re-check.

--max-age-minutes prevents stale artifacts from a prior review round from being mistaken as fresh output for the current diff. --require-agent-marker prevents cross-agent artifact mix-ups by requiring each file to contain Reviewer: <agent-id>. --require-reviewed-head prevents fresh-but-wrong artifacts (written for a different head commit) from being accepted. --head-repo should point at the exact PR worktree so HEAD resolution is deterministic.

This avoids losing findings when sub-agent message transport is flaky.

5. Act on findings

⛔ MANDATORY: Post findings as INLINE review comments on the diff. Never post a single large summary comment with all findings. This has been explicitly flagged multiple times.

5.1 Output hygiene (Nico requirement)

For any PR review content you post (inline comments or review body):

Include only reviewer-useful technical feedback.
Do not mention review process internals (multi-persona setup, AI/sub-agents, model names, orchestration steps).
Do not include process narration like "I asked X reviewers" or "agent Y found...".
If explicitly asked for methodology transparency, provide it separately in chat — not in the PR review text.

Step 5a: Compute line numbers

Before posting anything, map each finding to its exact file + line number in the new code. Use this helper to find new-file line numbers from the PR diff:

# Save as /tmp/find-review-lines.py and run with: python3 /tmp/find-review-lines.py
import json, subprocess, re

result = subprocess.run(
    ["gh", "api", f"repos/ChainSafe/lodestar/pulls/<PR>/files?per_page=100",
     "--jq", ".[] | {filename, patch}"],
    capture_output=True, text=True
)

def find_line(patch, search_text):
    lines = patch.split("\n")
    new_line = None
    for line in lines:
        m = re.match(r'^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@', line)
        if m:
            new_line = int(m.group(1))
            continue
        if new_line is None:
            continue
        if line.startswith('-'):
            continue
        if search_text in line:
            return new_line
        new_line += 1
    return None

Step 5b: Post as a single GitHub review with inline comments

Use the review API to batch all comments into ONE review submission. Use JSON input (not -f flags) so line is a proper integer:

cat > /tmp/review-payload.json << 'EOF'
{
  "commit_id": "<head_sha>",
  "event": "COMMENT",
  "body": "Brief summary of overall assessment + any general/cross-cutting findings not tied to specific lines. DO NOT mention review process details (multi-persona, AI reviewers, sub-agents). Only include info useful to the PR author.",
  "comments": [
    {
      "path": "packages/reqresp/src/file.ts",
      "line": 42,
      "side": "RIGHT",
      "body": "🔴 **Finding title**\n\nExplanation.\n\n```suggestion\nfixed code here\n```"
    }
  ]
}
EOF

gh api repos/ChainSafe/lodestar/pulls/<PR>/reviews \
  --method POST --input /tmp/review-payload.json

Comment format rules:

One inline comment per finding, on the exact line it refers to
Use 🔴 for must-fix, 🟡 for should-fix, 🟢 for suggestions
Use suggestion blocks for concrete code changes (GitHub renders these as committable)
General/cross-cutting findings that aren't tied to a specific line go in the review body — that's what it's for
The review body can contain both a brief summary AND general findings (architecture observations, cross-cutting patterns, etc.)
Deduplicate: if multiple reviewers flag the same line, merge into one comment noting convergence
Use -F line=N or JSON input (not -f line=N) — line must be an integer, not a string

Convergence signals quality: When multiple reviewers independently flag the same issue from different angles, highlight it — it's likely a real problem.

❌ DO NOT:

Post a single large comment with all findings listed

Use gh pr comment for review findings

Skip the line-number computation step

Use -f line=N (string) — always use -F line=N (integer) or JSON input

Mention review process details in the PR comment (multi-persona, AI reviewers, sub-agents, model names) — only include info useful to the PR author

For local changes (pre-PR, dev workflow Phase 4):

Don't post to GitHub — there's no PR yet. Instead:

Collect all reviewer findings
Fix issues directly in the worktree (small fixes → do yourself, large → back to coding agent)
Re-run reviewers on the updated diff if changes were significant
Only open the PR after the review cycle is clean

Lodestar Context for Reviewers

When constructing the task for each reviewer, append this Lodestar context block after the persona prompt (before the diff). This gives reviewers domain knowledge:

## Lodestar Codebase Context

Lodestar is a TypeScript Ethereum consensus client (beacon node + validator client + light client).

### Package Structure
- `beacon-node/` — core beacon chain logic, networking, sync, API server
- `validator/` — validator client (separate process, talks to beacon via API)
- `light-client/` — light client (runs in browsers too)
- `state-transition/` — pure state transition functions (spec implementation)
- `fork-choice/` — proto-array fork choice
- `types/` — SSZ type definitions for all forks
- `params/` — consensus constants and presets
- `config/` — runtime chain configuration
- `api/` — REST API client/server (shared between beacon-node and validator)
- `reqresp/` — libp2p request/response protocol
- `db/` — LevelDB abstraction
- `utils/` — shared utilities

### Fork Progression
phase0 → altair → bellatrix → capella → deneb → electra → fulu → gloas

Fork-aware code uses guards: `isForkPostElectra(fork)`, `isForkPostFulu(fork)`, etc.

### Key Conventions
- ES modules with `.js` extensions on relative imports (even for .ts files)
- Biome for linting/formatting, double quotes, no default exports
- `camelCase` functions/vars, `PascalCase` classes, `UPPER_SNAKE_CASE` constants
- Explicit parameter and return types, no `any`
- Prometheus metrics: always suffix with units (`_seconds`, `_bytes`, `_total`)
- Structured logging: `this.logger.debug("msg", {slot, root: toRootHex(root)})`

### Common Pitfalls to Watch For
- **Stale fork choice head:** `getHead()` returns cached ProtoBlock. After modifying proto-array state, must call `recomputeForkChoiceHead()`
- **Holding state references:** BeaconState objects are large tree-backed structures. Don't store beyond immediate use
- **Missing .js extension:** Relative imports must use `.js` for ESM resolution
- **Force push after review:** Never — use incremental commits
- **SSZ value vs view:** Value types (plain JS) vs ViewDU (tree-backed). State uses ViewDU — mutations need `.commit()`
- **Config vs params:** `@lodestar/params` = compile-time constants, `@lodestar/config` = runtime chain config

### Architecture Rules
- Beacon node, validator client, and light client are separate packages with clear boundaries
- Cross-package deps flow downward: beacon-node → state-transition → types → params
- Validator talks to beacon node only via REST API (never import beacon-node internals)
- State transition functions must be pure (no side effects, no network calls)
- Fork choice is its own package — beacon-node consumes it, doesn't extend it

Review Patterns Reference

Read references/review-patterns.md for patterns mined from ~2000 real Lodestar review comments. Key insights:

nflaig (lead): Spec citations, forward-compatible naming (avoid fork codenames), type safety, comment-code consistency
twoeths: ProtoBlock variant correctness, state cache keys, function signatures
wemeetagain: Metrics coverage, code simplification, future TODOs
ensi321: Edge case analysis, spec divergence, test correctness, scope enforcement

Use these patterns to calibrate reviewer expectations — e.g., the architect reviewer should flag missing metrics (wemeetagain pattern), and the wisdom reviewer should flag stale comments (nflaig pattern).

Finding Resolution Tracking

Use scripts/review/track-findings.py to track which findings get addressed in follow-up commits.

Workflow:

After synthesizing reviewer outputs, store key findings:

python3 ~/.openclaw/workspace/scripts/review/track-findings.py add <PR> \
  --file src/sync/range/chain.ts --line 142 --severity major \
  --reviewer review-bugs --body "Race condition in batch completion..."

Follow-up rounds (mandatory when revisiting a PR with new review comments): run the one-command guard wrapper so GitHub delta sync + metadata drift + stale-finding checks happen together.
```
bash ~/.openclaw/workspace/scripts/review/run-followup-guards.sh --check-only --json
```
Run this local preflight first in autonomous wrappers when you need a machine-readable check that the helper scripts, GitHub guard, CLI, and report directory are ready without fetching GitHub data or writing artifacts.
```
bash ~/.openclaw/workspace/scripts/review/run-followup-guards.sh <PR> \
  --repo ChainSafe/lodestar
```
The wrapper also writes a full PR discussion coverage report before syncing findings. This report fetches all three GitHub surfaces:
- Issue-level PR comments: repos/<repo>/issues/<PR>/comments
- Inline review comments: repos/<repo>/pulls/<PR>/comments
- Review bodies: repos/<repo>/pulls/<PR>/reviews
Before saying "no one replied", "lodekeeper is not in any thread", or "only bot comments exist", check this coverage report. Inline-only scans miss issue comments and review bodies.

Optional flags:
- --include-replies when maintainer discussion in reply threads matters for re-verification
- --skip-discussion-scan only when you have already fetched all three surfaces separately
- --fail-on-stale to make stale unresolved critical/major findings block the loop
- --stale-days <n> to tune the stale threshold (default 7)
Exit behavior:
- Exit 0: guards passed (or stale findings detected but non-fatal).
- Exit 2: metadata drift detected; wrapper prints the exact gh pr edit reminder command.
- Exit 3: stale findings detected when --fail-on-stale is set.
Default artifacts:
- ~/.openclaw/workspace/notes/review-reports/pr-<PR>-discussion.md
- ~/.openclaw/workspace/notes/review-reports/pr-<PR>-metadata-drift.md
- ~/.openclaw/workspace/notes/review-reports/pr-<PR>-stale-findings.md
Manual equivalent (if needed):
```
mkdir -p ~/.openclaw/workspace/notes/review-reports
python3 ~/.openclaw/workspace/scripts/review/fetch-pr-discussion.py <PR> \
  --repo ChainSafe/lodestar \
  > ~/.openclaw/workspace/notes/review-reports/pr-<PR>-discussion.md
python3 ~/.openclaw/workspace/scripts/review/track-findings.py sync-gh <PR> --repo ChainSafe/lodestar
python3 ~/.openclaw/workspace/scripts/github/check-pr-metadata-drift.py \
  --pr <PR> --repo ChainSafe/lodestar \
  > ~/.openclaw/workspace/notes/review-reports/pr-<PR>-metadata-drift.md
python3 ~/.openclaw/workspace/scripts/review/track-findings.py stale <PR> \
  --days 7 --severity critical major --fail-on-match \
  > ~/.openclaw/workspace/notes/review-reports/pr-<PR>-stale-findings.md
```
Keep artifact paths in your review notes / tracker entry for traceability.

When the author pushes a new commit, check coverage:

# Get changed files from the new commit
gh pr diff <PR> --repo ChainSafe/lodestar --name-only > /tmp/changed-files.txt
python3 ~/.openclaw/workspace/scripts/review/track-findings.py check <PR> \
  --changed-files $(cat /tmp/changed-files.txt | tr '\n' ' ')

Output: which findings are on changed files (→ verify!) vs. still-untouched files (→ still open).

Mark resolved findings:

python3 ~/.openclaw/workspace/scripts/review/track-findings.py resolve <PR> <id> --commit <sha>

Generate a markdown summary for a GitHub follow-up comment:

python3 ~/.openclaw/workspace/scripts/review/track-findings.py dump <PR>

Escalate stale unresolved findings (default: open critical/major older than 7 days by updated timestamp):

python3 ~/.openclaw/workspace/scripts/review/track-findings.py stale <PR>
# Use --fail-on-match in automation wrappers to raise non-zero exit when stale items exist

Also available: import --markdown <file> (parse free-form reviewer output), import-gh (one-shot bootstrap from GitHub review comments), dedup (group by file+proximity), stale (age/severity stale-finding report).

Tips

For consensus-spec-related changes, cross-reference ~/consensus-specs for correctness
For API changes, cross-reference ~/beacon-APIs (ethereum/beacon-APIs)
The security reviewer is especially valuable for networking/p2p/reqresp changes — consensus clients are adversarial environments
The architect reviewer catches cross-package boundary violations that other reviewers miss
When reviewing ePBS/Gloas code, pay extra attention to ProtoBlock variant handling (twoeths's top concern)
For any new functionality, check if Prometheus metrics are needed (wemeetagain pattern)

Self-Maintenance

If any commands, file paths, URLs, or configurations in this skill are outdated or no longer work, update this SKILL.md with the correct information after completing your current task. Skills should stay accurate and self-healing — fix what you find broken.