name: nemoclaw-maintainer-pr-comparator description: Compares competing PRs that target the same issue and recommends which one to merge. Runs gate, correctness, and quality checks; outputs a deterministic scorecard with reasoning trace. Use when an issue has two or more open PRs and a maintainer needs to decide which to merge. user_invocable: true
PR Comparator
Picks the merge winner among competing PRs for a single issue. Tier 0 gates eliminate plumbing failures; Tiers 1-2 score correctness and quality; Tier 3 applies deterministic tiebreakers. Degraded mode handles the case where no PR passes gates.
Prerequisites
ghCLI installed and authenticated- A target repository with an issue that has 2+ open PRs
Repo policy
Defaults assume NemoClaw conventions (security CODEOWNERS, DCO, CodeRabbit, docs/ directory). For other repos, edit repo-policy.md to override.
Workflow
Copy this checklist into your response and check off each step:
PR Comparison Progress:
- [ ] Step 1: Parse issue (body + comments) for acceptance criteria
- [ ] Step 2: Discover candidate PRs (default-order search with stop conditions)
- [ ] Step 3: Detect supersession (parse PR bodies)
- [ ] Step 4: Run Tier 0 gates per PR
- [ ] Step 5: Run Tier 1 correctness checks per PR
- [ ] Step 6: Run Tier 2 quality checks per PR
- [ ] Step 7: Compute weighted scores
- [ ] Step 8: Apply Tier 3 ranking (happy path or degraded mode)
- [ ] Step 9: Emit verdict using templates/verdict.md
Step 1: Parse issue
Extract acceptance criteria from issue body and all comments:
gh issue view <issue-number> --json title,body,comments
Read every comment — commenters often add asks the body doesn't capture.
Step 2: Discover candidate PRs
scripts/find-candidates.sh <issue-number>
Applies a single default order with stop conditions.
Step 3: Detect supersession
scripts/parse-supersession.sh <pr-number-1> <pr-number-2> ...
Parses each PR body for supersedes #N, replaces #N, closes in favor of #N, folds in #N. A PR that supersedes another wins ties immediately.
Step 4: Tier 0 gates
scripts/collect-gates.sh <pr-number>
scripts/check-coderabbit-threads.sh <pr-number>
Five gates, all mandatory. See checks/tier-0-gates.md for the full list and interpretation.
Step 5: Tier 1 correctness
Six checks, all LLM judgments. See checks/tier-1-correctness.md for evidence requirements per check.
Step 6: Tier 2 quality
Three checks, all LLM judgments. See checks/tier-2-quality.md.
Step 7: Weighted score
- Each pass = full points
- Each yellow = half points
- Each fail = zero
- Tier 1 weight: 2.0× per check
- Tier 2 weight: 1.0× per check
Step 8: Tier 3 ranking
Branch on whether any PR passes all Tier 0 gates. See tiebreakers.md for happy-path tiebreakers, degraded-mode distance-to-ready ranking, and the behavior-coverage matrix.
Step 9: Emit verdict
Use templates/verdict.md. Every judgment must carry evidence (file:line refs, diff snippets), reasoning chain, and the score it contributed.
Reference files
repo-policy.md— configurable defaults per target repochecks/tier-0-gates.md— plumbing gateschecks/tier-1-correctness.md— six correctness checkschecks/tier-2-quality.md— three quality checkstiebreakers.md— Tier 3 ranking and degraded modetemplates/verdict.md— output templatevalidation/backtest.md— backtest the skill against historical cases
Scripts (execute, do not read)
scripts/find-candidates.sh— PR discoveryscripts/collect-gates.sh— Tier 0 gate evaluationscripts/check-coderabbit-threads.sh— GraphQL thread resolutionscripts/parse-supersession.sh— body parsing for supersession refsscripts/render-verdict.py— verdict scorecard renderer
What this skill does NOT do
These require infrastructure beyond GitHub API + LLM and are deferred to v2 modules:
- Running each PR's code against adversarial inputs (sandboxed execution)
- Cross-issue regression sweep (separate skill)
- Revert simulation against neighbor PRs
- Static analyzer integration (CodeQL, Semgrep)