name: lodestar-review description: Run multi-persona code reviews on Lodestar PRs. Use when reviewing a PR, diff, or code change in ChainSafe/lodestar. Spawns specialized reviewer agents (bug hunter, security engineer, architect, etc.) with Lodestar-specific context, collects findings, and synthesizes a consolidated report. Covers PR review, code quality assessment, and security analysis for Ethereum consensus client code.
Lodestar Code Review
Multi-persona review system for ChainSafe/lodestar PRs. Each reviewer has a narrow scope with explicit rejection criteria, enhanced with Lodestar-specific knowledge.
Reviewers
Agent ID (use as agentId!) |
Role | Model | Focus |
|---|---|---|---|
review-bugs |
Bug Hunter | GPT-5.3-Codex | Functional errors, logic flaws, off-by-one. Only ACTUAL broken behavior. |
review-defender |
Defender | GPT-5.3-Codex | Malicious code, backdoors, supply chain threats. |
review-linter |
Style Enforcer | Gemini 2.5 Pro | Style consistency vs Lodestar conventions. |
review-security |
Security Engineer | GPT-5.3-Codex | DoS vectors, peer manipulation, validation bypasses, crypto misuse. |
review-wisdom |
Wise Senior | Claude Opus 4.6 | Clean code principles, maintainability, readability. |
reviewer-architect |
Architect | GPT-5.3-Codex (thinking: xhigh) | Package boundaries, consensus spec alignment, module coupling. |
review-devils-advocate |
Devil's Advocate | Claude Opus 4.6 (thinking: high) | Challenges premise, simpler alternatives, spec interpretation, necessity. |
⚠️ Always pass the Agent ID as
agentIdinsessions_spawn. Omitting it routes to your default model instead of the reviewer's configured model. This is a mandatory field, not optional.
Reviewer Selection
Pick reviewers based on PR type:
| PR Type | Reviewers |
|---|---|
| Any PR | review-bugs (always) |
| >50 lines changed | + review-wisdom |
| Feature / new functionality | + reviewer-architect, + review-devils-advocate |
| Consensus / spec-related | + review-devils-advocate |
| Networking / API / p2p changes | + review-security |
| External contributor | + review-defender |
| Style-heavy / refactor | + review-linter |
Cost is not a concern — use all relevant reviewers.
Workflow
1. Get the diff
For open PRs:
gh pr diff <PR_NUMBER> --repo ChainSafe/lodestar
For local changes (pre-PR, dev workflow Phase 4):
cd ~/lodestar-<feature>
git diff unstable...HEAD # diff against base branch
# or for staged changes:
git diff --cached
For large diffs (>3000 lines), focus on the most critical files or split into chunks.
1.5 Get the changed-file list (mandatory — prevents false positives)
Before spawning any reviewer, capture the exact set of files changed in this PR/branch. This list is used:
- In the reviewer task prompt (so reviewers know exact scope)
- After reviews complete (to discard findings about files NOT in this list)
# For open PRs:
gh pr diff <PR_NUMBER> --repo ChainSafe/lodestar --name-only
# For local changes:
cd ~/lodestar-<feature>
git diff --name-only origin/unstable...HEAD
Save the output as CHANGED_FILES. Add it to each reviewer task as:
## Files Changed in This PR
<CHANGED_FILES list, one per line>
IMPORTANT: Only flag issues in the files listed above. Do NOT comment on files not in this list, even if they appear in the broader codebase context.
⚠️ FALSE POSITIVE GUARD (mandatory post-review step): Before acting on any finding, verify the flagged file is in
CHANGED_FILES. Discard any finding that references a file not in this list — reviewers sometimes hallucinate scope from broader context. This has burned real time (PR #8993: two reviewers flaggeddataColumns.tsandgloas.ts, neither was in the diff).
2. Read persona prompts
Each reviewer's persona is in references/<agent-id>.md (relative to this skill directory). Read the persona file before spawning.
3. Spawn reviewers
For each selected reviewer, spawn with the persona prepended to the diff. Always include the ## Files Changed block (from Step 1.5) between the persona and the diff — this is the primary mechanism that prevents false positives:
sessions_spawn(
agentId: "<agent-id>", // ⚠️ MANDATORY — always pass agentId to route to the correct model!
task: "<persona prompt>\n\n---\n\n## Files Changed in This PR\n<CHANGED_FILES, one per line>\n\nIMPORTANT: Only flag issues in the files listed above. Do NOT comment on files not in this list, even if they appear in the broader codebase context.\n\n---\n\nReview this diff for ChainSafe/lodestar PR #<number> (<title>):\n\n```diff\n<diff>\n```",
label: "pr<number>-<reviewer-short-name>",
runTimeoutSeconds: <timeout> // Scale by diff size (see below)
)
Spawn all selected reviewers in parallel (no dependencies between them).
Durable-output requirement (mandatory): include this block at the end of every reviewer task so findings survive announce/session transport issues:
Context metadata for this run:
- Reviewer: <agent-id>
- Reviewed commit: <HEAD_SHA>
After finishing the review:
1) Write your full findings markdown to `~/.openclaw/workspace/notes/review-reports/pr-<PR>-<agent-id>.md`.
- Preferred helper (avoids marker drift):
`bash ~/.openclaw/workspace/scripts/review/write-review-artifact.sh --pr <PR> --agent <agent-id> --head-repo /absolute/path/to/repo <<'EOF'\n<findings markdown>\nEOF`
- Include the exact metadata line `Reviewer: <agent-id>` near the top of the artifact.
- Include the exact metadata line `Reviewed commit: <HEAD_SHA>` near the top of the artifact.
- If there are no findings, write a short "No findings" report anyway.
2) In your final chat response, include the exact file path you wrote.
(For pre-PR local review, replace <PR> with a stable branch/task slug, e.g. local-<branch>.)
⚠️
agentIdis MANDATORY for every reviewer spawn. Without it, the subagent inherits the parent's default model instead of the reviewer's configured model. This caused bugs+security reviewers to run on Claude Opus instead of GPT-5.3-Codex, leading to timeouts on PR #8962 (105KB diff).
⚠️ Timeout scaling by diff size:
Diff size runTimeoutSeconds< 30KB (~800 lines) 180 30–80KB (~800–2000 lines) 300 > 80KB (~2000+ lines) 420 The default 300s is too short for large diffs. PR #8962 (105KB) timed out 2 reviewers at 300s; retries succeeded at 420s (bugs: 6m7s, security: 4m12s).
Note: For reviewer-architect, always pass thinking: "xhigh" in the spawn call for deep architectural reasoning.
4. Wait for results
All spawned reviewers will announce their findings back to the main session. Wait for ALL to complete before synthesizing.
4.1 Transport-failure fallback (mandatory)
If a reviewer announces completion but the actual findings are missing/truncated, do not proceed with partial context.
- Read the reviewer artifact directly from
notes/review-reports/pr-<PR>-<agent-id>.md(from Step 3 durable-output requirement). - If the file is missing for any reviewer, re-run only that reviewer.
- Only synthesize once every selected reviewer has either:
- a complete announce payload, or
- a complete on-disk artifact file.
Quick verifier command (recommended before synthesis):
bash ~/.openclaw/workspace/scripts/review/check-review-artifacts.sh \
--pr <PR> \
--agents <agent-id-1> <agent-id-2> ... \
--allow-empty-no-findings \
--max-age-minutes 180 \
--require-agent-marker \
--require-reviewed-head \
--head-repo /absolute/path/to/repo
- Exit
0: every expected reviewer artifact exists and is usable. - Exit
2: at least one expected artifact is missing/invalid/stale — re-run only the missing reviewer(s), then re-check.
--max-age-minutes prevents stale artifacts from a prior review round from being mistaken as fresh output for the current diff.
--require-agent-marker prevents cross-agent artifact mix-ups by requiring each file to contain Reviewer: <agent-id>.
--require-reviewed-head prevents fresh-but-wrong artifacts (written for a different head commit) from being accepted.
--head-repo should point at the exact PR worktree so HEAD resolution is deterministic.
This avoids losing findings when sub-agent message transport is flaky.
5. Act on findings
⛔ MANDATORY: Post findings as INLINE review comments on the diff. Never post a single large summary comment with all findings. This has been explicitly flagged multiple times.
5.1 Output hygiene (Nico requirement)
For any PR review content you post (inline comments or review body):
- Include only reviewer-useful technical feedback.
- Do not mention review process internals (multi-persona setup, AI/sub-agents, model names, orchestration steps).
- Do not include process narration like "I asked X reviewers" or "agent Y found...".
- If explicitly asked for methodology transparency, provide it separately in chat — not in the PR review text.
Step 5a: Compute line numbers
Before posting anything, map each finding to its exact file + line number in the new code. Use this helper to find new-file line numbers from the PR diff:
# Save as /tmp/find-review-lines.py and run with: python3 /tmp/find-review-lines.py
import json, subprocess, re
result = subprocess.run(
["gh", "api", f"repos/ChainSafe/lodestar/pulls/<PR>/files?per_page=100",
"--jq", ".[] | {filename, patch}"],
capture_output=True, text=True
)
def find_line(patch, search_text):
lines = patch.split("\n")
new_line = None
for line in lines:
m = re.match(r'^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@', line)
if m:
new_line = int(m.group(1))
continue
if new_line is None:
continue
if line.startswith('-'):
continue
if search_text in line:
return new_line
new_line += 1
return None
Step 5b: Post as a single GitHub review with inline comments
Use the review API to batch all comments into ONE review submission. Use JSON input (not -f flags) so line is a proper integer:
cat > /tmp/review-payload.json << 'EOF'
{
"commit_id": "<head_sha>",
"event": "COMMENT",
"body": "Brief summary of overall assessment + any general/cross-cutting findings not tied to specific lines. DO NOT mention review process details (multi-persona, AI reviewers, sub-agents). Only include info useful to the PR author.",
"comments": [
{
"path": "packages/reqresp/src/file.ts",
"line": 42,
"side": "RIGHT",
"body": "🔴 **Finding title**\n\nExplanation.\n\n```suggestion\nfixed code here\n```"
}
]
}
EOF
gh api repos/ChainSafe/lodestar/pulls/<PR>/reviews \
--method POST --input /tmp/review-payload.json
Comment format rules:
- One inline comment per finding, on the exact line it refers to
- Use
🔴for must-fix,🟡for should-fix,🟢for suggestions - Use
suggestionblocks for concrete code changes (GitHub renders these as committable) - General/cross-cutting findings that aren't tied to a specific line go in the review
body— that's what it's for - The review
bodycan contain both a brief summary AND general findings (architecture observations, cross-cutting patterns, etc.) - Deduplicate: if multiple reviewers flag the same line, merge into one comment noting convergence
- Use
-F line=Nor JSON input (not-f line=N) — line must be an integer, not a string
Convergence signals quality: When multiple reviewers independently flag the same issue from different angles, highlight it — it's likely a real problem.
❌ DO NOT:
- Post a single large comment with all findings listed
- Use
gh pr commentfor review findings- Skip the line-number computation step
- Use
-f line=N(string) — always use-F line=N(integer) or JSON input- Mention review process details in the PR comment (multi-persona, AI reviewers, sub-agents, model names) — only include info useful to the PR author
For local changes (pre-PR, dev workflow Phase 4):
Don't post to GitHub — there's no PR yet. Instead:
- Collect all reviewer findings
- Fix issues directly in the worktree (small fixes → do yourself, large → back to coding agent)
- Re-run reviewers on the updated diff if changes were significant
- Only open the PR after the review cycle is clean
Lodestar Context for Reviewers
When constructing the task for each reviewer, append this Lodestar context block after the persona prompt (before the diff). This gives reviewers domain knowledge:
## Lodestar Codebase Context
Lodestar is a TypeScript Ethereum consensus client (beacon node + validator client + light client).
### Package Structure
- `beacon-node/` — core beacon chain logic, networking, sync, API server
- `validator/` — validator client (separate process, talks to beacon via API)
- `light-client/` — light client (runs in browsers too)
- `state-transition/` — pure state transition functions (spec implementation)
- `fork-choice/` — proto-array fork choice
- `types/` — SSZ type definitions for all forks
- `params/` — consensus constants and presets
- `config/` — runtime chain configuration
- `api/` — REST API client/server (shared between beacon-node and validator)
- `reqresp/` — libp2p request/response protocol
- `db/` — LevelDB abstraction
- `utils/` — shared utilities
### Fork Progression
phase0 → altair → bellatrix → capella → deneb → electra → fulu → gloas
Fork-aware code uses guards: `isForkPostElectra(fork)`, `isForkPostFulu(fork)`, etc.
### Key Conventions
- ES modules with `.js` extensions on relative imports (even for .ts files)
- Biome for linting/formatting, double quotes, no default exports
- `camelCase` functions/vars, `PascalCase` classes, `UPPER_SNAKE_CASE` constants
- Explicit parameter and return types, no `any`
- Prometheus metrics: always suffix with units (`_seconds`, `_bytes`, `_total`)
- Structured logging: `this.logger.debug("msg", {slot, root: toRootHex(root)})`
### Common Pitfalls to Watch For
- **Stale fork choice head:** `getHead()` returns cached ProtoBlock. After modifying proto-array state, must call `recomputeForkChoiceHead()`
- **Holding state references:** BeaconState objects are large tree-backed structures. Don't store beyond immediate use
- **Missing .js extension:** Relative imports must use `.js` for ESM resolution
- **Force push after review:** Never — use incremental commits
- **SSZ value vs view:** Value types (plain JS) vs ViewDU (tree-backed). State uses ViewDU — mutations need `.commit()`
- **Config vs params:** `@lodestar/params` = compile-time constants, `@lodestar/config` = runtime chain config
### Architecture Rules
- Beacon node, validator client, and light client are separate packages with clear boundaries
- Cross-package deps flow downward: beacon-node → state-transition → types → params
- Validator talks to beacon node only via REST API (never import beacon-node internals)
- State transition functions must be pure (no side effects, no network calls)
- Fork choice is its own package — beacon-node consumes it, doesn't extend it
Review Patterns Reference
Read references/review-patterns.md for patterns mined from ~2000 real Lodestar review comments. Key insights:
- nflaig (lead): Spec citations, forward-compatible naming (avoid fork codenames), type safety, comment-code consistency
- twoeths: ProtoBlock variant correctness, state cache keys, function signatures
- wemeetagain: Metrics coverage, code simplification, future TODOs
- ensi321: Edge case analysis, spec divergence, test correctness, scope enforcement
Use these patterns to calibrate reviewer expectations — e.g., the architect reviewer should flag missing metrics (wemeetagain pattern), and the wisdom reviewer should flag stale comments (nflaig pattern).
Finding Resolution Tracking
Use scripts/review/track-findings.py to track which findings get addressed in follow-up commits.
Workflow:
After synthesizing reviewer outputs, store key findings:
python3 ~/.openclaw/workspace/scripts/review/track-findings.py add <PR> \ --file src/sync/range/chain.ts --line 142 --severity major \ --reviewer review-bugs --body "Race condition in batch completion..."Follow-up rounds (mandatory when revisiting a PR with new review comments): run the one-command guard wrapper so GitHub delta sync + metadata drift + stale-finding checks happen together.
bash ~/.openclaw/workspace/scripts/review/run-followup-guards.sh --check-only --jsonRun this local preflight first in autonomous wrappers when you need a machine-readable check that the helper scripts, GitHub guard, CLI, and report directory are ready without fetching GitHub data or writing artifacts.
bash ~/.openclaw/workspace/scripts/review/run-followup-guards.sh <PR> \ --repo ChainSafe/lodestarThe wrapper also writes a full PR discussion coverage report before syncing findings. This report fetches all three GitHub surfaces:
- Issue-level PR comments:
repos/<repo>/issues/<PR>/comments - Inline review comments:
repos/<repo>/pulls/<PR>/comments - Review bodies:
repos/<repo>/pulls/<PR>/reviews
Before saying "no one replied", "lodekeeper is not in any thread", or "only bot comments exist", check this coverage report. Inline-only scans miss issue comments and review bodies.
Optional flags:
--include-replieswhen maintainer discussion in reply threads matters for re-verification--skip-discussion-scanonly when you have already fetched all three surfaces separately--fail-on-staleto make stale unresolved critical/major findings block the loop--stale-days <n>to tune the stale threshold (default7)
Exit behavior:
- Exit
0: guards passed (or stale findings detected but non-fatal). - Exit
2: metadata drift detected; wrapper prints the exactgh pr editreminder command. - Exit
3: stale findings detected when--fail-on-staleis set.
Default artifacts:
~/.openclaw/workspace/notes/review-reports/pr-<PR>-discussion.md~/.openclaw/workspace/notes/review-reports/pr-<PR>-metadata-drift.md~/.openclaw/workspace/notes/review-reports/pr-<PR>-stale-findings.md
Manual equivalent (if needed):
mkdir -p ~/.openclaw/workspace/notes/review-reports python3 ~/.openclaw/workspace/scripts/review/fetch-pr-discussion.py <PR> \ --repo ChainSafe/lodestar \ > ~/.openclaw/workspace/notes/review-reports/pr-<PR>-discussion.md python3 ~/.openclaw/workspace/scripts/review/track-findings.py sync-gh <PR> --repo ChainSafe/lodestar python3 ~/.openclaw/workspace/scripts/github/check-pr-metadata-drift.py \ --pr <PR> --repo ChainSafe/lodestar \ > ~/.openclaw/workspace/notes/review-reports/pr-<PR>-metadata-drift.md python3 ~/.openclaw/workspace/scripts/review/track-findings.py stale <PR> \ --days 7 --severity critical major --fail-on-match \ > ~/.openclaw/workspace/notes/review-reports/pr-<PR>-stale-findings.mdKeep artifact paths in your review notes / tracker entry for traceability.
- Issue-level PR comments:
When the author pushes a new commit, check coverage:
# Get changed files from the new commit gh pr diff <PR> --repo ChainSafe/lodestar --name-only > /tmp/changed-files.txt python3 ~/.openclaw/workspace/scripts/review/track-findings.py check <PR> \ --changed-files $(cat /tmp/changed-files.txt | tr '\n' ' ')Output: which findings are on changed files (→ verify!) vs. still-untouched files (→ still open).
Mark resolved findings:
python3 ~/.openclaw/workspace/scripts/review/track-findings.py resolve <PR> <id> --commit <sha>Generate a markdown summary for a GitHub follow-up comment:
python3 ~/.openclaw/workspace/scripts/review/track-findings.py dump <PR>Escalate stale unresolved findings (default: open critical/major older than 7 days by
updatedtimestamp):python3 ~/.openclaw/workspace/scripts/review/track-findings.py stale <PR> # Use --fail-on-match in automation wrappers to raise non-zero exit when stale items exist
Also available: import --markdown <file> (parse free-form reviewer output), import-gh (one-shot bootstrap from GitHub review comments), dedup (group by file+proximity), stale (age/severity stale-finding report).
Tips
- For consensus-spec-related changes, cross-reference
~/consensus-specsfor correctness - For API changes, cross-reference
~/beacon-APIs(ethereum/beacon-APIs) - The security reviewer is especially valuable for networking/p2p/reqresp changes — consensus clients are adversarial environments
- The architect reviewer catches cross-package boundary violations that other reviewers miss
- When reviewing ePBS/Gloas code, pay extra attention to ProtoBlock variant handling (twoeths's top concern)
- For any new functionality, check if Prometheus metrics are needed (wemeetagain pattern)
Self-Maintenance
If any commands, file paths, URLs, or configurations in this skill are outdated or no longer work, update this SKILL.md with the correct information after completing your current task. Skills should stay accurate and self-healing — fix what you find broken.