paper-review-sim - SKILL.md Agent Skill

name: paper-review-sim description: Simulate a NeurIPS/SC/ICSE-style peer review with 5 reviewer personas (HPC, ML, Stats, Reproducibility, Devil's Advocate). Use before paper submission, after major methodology changes, or when stress-testing a draft against expected objections. Each reviewer verifies claims against actual result data. auto-activate: false

Paper Review Simulation

Use when preparing a paper draft for submission, stress-testing claims before writing, or wanting structured feedback on methodology, presentation, or rigor. Simulates a 5-reviewer conference panel.

Trigger: When user types /paper-review-sim with optional arguments.

Arguments

$ARGUMENTS --- path to paper draft file(s) or specific section to review. If omitted, searches for docs/paper_draft.md or asks the user.

Iron Law

NO REVIEWER MAY ACCEPT A CLAIM WITHOUT TRACING IT TO DATA.
Every number, percentage, comparison, or performance claim in the paper must be
verified against actual files in results/evaluation/ or results/augmentation/.
"The paper says X" is not evidence. The data file says X is evidence.

Anti-Rationalization Table

Excuse	Reality
"The numbers are from a previous session"	Previous sessions can have bugs. Verify against current disk state.
"This is just a rough draft"	Rough drafts with wrong numbers become final drafts with wrong numbers.
"I'll verify during camera-ready"	Camera-ready is too late. conference reviewers see the submitted version.
"The reviewer personas are too harsh"	conference acceptance rate is ~25%. Real reviewers ARE this harsh.
"One reviewer disagreeing is fine"	One dissenting reviewer can sink a paper. Address all concerns.

Red Flags --- STOP

If any of these occur, halt the review and flag immediately:

A number in the paper cannot be traced to any file on disk
A claim about model A vs model B contradicts the actual result JSONs
"All models" or "consistently" used without checking every model's data
Performance claims based on wall-clock time (unreliable --- see Timing Caveat below)
Missing error bars, confidence intervals, or sample sizes for quantitative claims
Related work section missing key papers (LASSI, CodeRosetta, HPC-Coder-v2, OMPify, HPCorpus)

Workflow

Phase 1: Locate and Read the Draft

Find the paper draft at the path in $ARGUMENTS or docs/paper_draft.md.
Read the full draft to understand structure, claims, and data references.
Identify every quantitative claim (pass rates, failure counts, comparisons).

Verification gate: Draft located and read. All quantitative claims catalogued.

Phase 2: Data Verification Sweep

Before spawning reviewers, verify every number in the paper against actual data:

# Count result files per model
for model_dir in {{PROJECT_ROOT}}/results/evaluation/*/; do
  echo "$(basename $model_dir): $(find "$model_dir" -name '*.json' | wc -l) files"
done

Read specific result JSONs to verify claimed pass rates. Use overall_status field (not top-level run_status) as the authoritative verdict.

Build a verification table:

=== DATA VERIFICATION ===
| Claim in paper | Source file(s) | Verified value | Match? |
|----------------|---------------|----------------|--------|
| "34% overall pass rate" | eval_summary.json | <actual> | YES/NO |
| "BUILD_FAIL is 36%" | <files> | <actual> | YES/NO |

Verification gate: Every number traced to a file. Any mismatches flagged BEFORE the review proceeds.

Phase 3: Spawn Review Panel

Launch 5 subagents in parallel. Each reviewer gets:

The full paper draft (or relevant section)
The data verification table from Phase 2
Their specific review focus (below)
Access to read files in results/evaluation/ and results/augmentation/