rebuttal

name: rebuttal description: > Rebuttal pipeline for conference paper reviews. Parses reviewer feedback, classifies concerns by severity/type, builds a per-reviewer response strategy, and drafts a venue-compliant rebuttal with placeholders for pending experiments. Supports follow-up rounds. Use when user says "rebuttal", "reply to reviewers", "respond to reviews", "rebuttal draft", or wants to answer reviewer comments for a conference submission. user-invocable: true argument-hint: " [--venue NeurIPS|ICML|CVPR|ACL|AAAI|ICCV|ICLR] [--char-limit 5000] [--plan-only] [--followup]"

Overview

Prepare a grounded, venue-compliant rebuttal for conference paper reviewer feedback. The skill follows a 3-phase pipeline:

Review Analysis — Parse reviews, atomize concerns, classify severity/type, identify shared themes
Strategy Plan — Per-reviewer response strategy with attitude, angle, evidence mapping, and experiment gap report
Draft & Validate — Write rebuttal with placeholders for pending experiments, run lints and stress test, produce paste-ready output

The skill writes all rebuttal prose except experiment results. For issues requiring new experiments, it writes the surrounding context and inserts [INSERT: ...] placeholders where results should go.

Arguments

`<paper-path>` (required)

Path to the submitted paper. Can be:

A PDF file (absolute or relative path)
A LaTeX project directory (will look for main.tex or first .tex file)

`<reviews-path>` (required)

Path to a markdown file containing reviews copied from OpenReview/CMT/HotCRP. Reviewer IDs should be preserved (e.g., ## Reviewer 1, ## Reviewer #2, ## R3).

`--venue` (optional)

Target conference. Options: NeurIPS, ICML, CVPR, ACL, AAAI, ICCV, ICLR.

Default: auto-detect from review format and scoring scale
Fallback: generic format
Determines character limits, response structure, and review parsing heuristics

`--char-limit` (optional, default: venue-specific or 5000)

Character limit for the rebuttal. Overrides venue default. This is the total limit across all reviewer responses.

`--plan-only` (optional)

Stop after Phase 2. Outputs ISSUE_BOARD.md + STRATEGY_PLAN.md + EXPERIMENT_GAPS.md without drafting the rebuttal. Useful for reviewing the strategy before committing to a draft.

`--followup` (optional)

Follow-up round mode. Expects an existing output directory from a prior run. Parses new reviewer comments and generates delta replies only.

Setup

Install dependencies:

python -m pip install -r "BASE_DIR/scripts/requirements.txt"

Workflow

Phase 1: Review Analysis

Step 1.0: Detect Paper Source

If paper-path is a PDF: read directly with the Read tool (paginate for PDFs over 20 pages).

If paper-path is a directory:

Check if the directory contains a compiled PDF — if so, prefer the PDF
Otherwise, look for main.tex or the first .tex file found
Resolve \input{} and \include{} by recursively reading referenced files
Parse concatenated .tex content as the paper text (equations remain as LaTeX source)

Step 1.1: Read and Understand the Paper

Read the entire paper. While reading, compile notes on:

Title, authors, affiliation
Core claims and contributions (from abstract + introduction)
Methodology — key techniques, algorithms, theoretical results
Experimental setup — datasets, baselines, metrics, ablations
Results — key numbers, tables, figures
Limitations — author-acknowledged weaknesses
Specific sections/tables/equations — note identifiers (e.g., "Table 3", "Eq. 5", "Section 4.2") for citation in rebuttal

Step 1.2: Parse and Normalize Reviews

Read the reviews markdown file
Split by reviewer ID. Supported patterns: Reviewer 1, Reviewer #1, R1, Reviewer A, or markdown headings (## Reviewer 1). If the format is non-standard, ask the user to clarify reviewer boundaries.
For each reviewer, extract:
- Scores (if present): overall score, confidence, sub-scores
- Strengths section
- Weaknesses section
- Questions section
- Minor issues
Save verbatim copy to REVIEWS_RAW.md in the output directory

Step 1.3: Atomize and Classify Concerns

For each reviewer, break down their feedback into discrete atomic concerns. Each concern gets:

Field	Description
`issue_id`	Unique ID: `R{reviewer}-C{number}` (e.g., R1-C1, R2-C3)
`raw_quote`	Verbatim excerpt from the review
`issue_type`	One of: `novelty`, `empirical_support`, `baseline_comparison`, `theorem_rigor`, `assumptions`, `complexity`, `clarity`, `reproducibility`, `practical_significance`, `other`
`severity`	`critical` (blocks acceptance), `major` (significant concern), `minor` (nice-to-fix)
`reviewer_stance`	Inferred from scores + tone using the lookup table in `references/rebuttal_guidelines.md`
`needs_experiment`	`true` if addressing this concern requires new experimental results
`status`	Initially `open` for all concerns

Step 1.4: Identify Shared Themes

Scan across all reviewers for overlapping concerns:

Group issues by issue_type and semantic similarity
Flag themes raised by 2+ reviewers (these go in the global opener)
Note contradictions between reviewers (one praises what another criticizes)

Step 1.5: Situation Assessment

Compute and write a summary:

Scores per reviewer (raw and normalized stance)
Champions vs swing voters vs detractors
Shared themes with reviewer overlap
Path to acceptance: which reviewers to convert and what it takes

Output: `ISSUE_BOARD.md`

# Situation Assessment

- Scores: R1 (6/10, lean_accept), R2 (4/10, lean_reject), R3 (5/10, neutral)
- Champions: R1 | Swing voters: R3 | Detractors: R2
- Shared themes: [scalability concerns (R2, R3), missing ablation (R1, R2)]
- Path to acceptance: Convert R3 by addressing scalability + ablation

# Issue Board

| ID | Reviewer | Type | Severity | Quote | Needs Experiment | Status |
|----|----------|------|----------|-------|-----------------|--------|
| R1-C1 | R1 | clarity | minor | "Section 3.2 is hard to follow" | false | open |
| R1-C2 | R1 | empirical_support | major | "Ablation missing for component X" | true | open |
| R2-C1 | R2 | empirical_support | critical | "No comparison with Method Y" | true | open |
| R2-C2 | R2 | novelty | major | "Similar to Z (2024)" | false | open |
| R3-C1 | R3 | complexity | major | "Scalability not demonstrated" | true | open |

Phase 2: Strategy Plan

Step 2.1: Assign Response Modes

For each issue in ISSUE_BOARD.md, assign a response mode using this decision tree (in priority order — prefer the first applicable mode):

Does the reviewer factually misread the paper or miss existing content? → direct_clarification — Point to specific section/table/equation they missed
Is there existing evidence in the paper that answers the concern? → grounded_evidence — Cite specific numbers, theorems, or results already present
Is this a novelty dispute? → nearest_work_delta — Name the closest prior work + exact technical difference
Does the concern require new experimental results to address? → additional_experiment — Placeholder in draft, added to EXPERIMENT_GAPS.md
Is the reviewer correct about a limitation? → narrow_concession — Acknowledge honestly, then scope the impact narrowly
Is the concern valid but out of scope for this paper? → future_work — Commit to future investigation, explain current scope boundary

If multiple modes apply, prefer the one higher in the list (stronger evidence first).

Step 2.2: Define Response Angles

For each issue, write 1-2 sentences describing:

What to say — the core argument or evidence to present
Tone — e.g., "Politely clarify that Table 3 already shows this", "Acknowledge the gap and present the planned ablation"

Step 2.3: Build Experiment Gap Report

Create EXPERIMENT_GAPS.md listing all needs_experiment: true issues:

# Experiment Gaps

| ID | Issue | Experiment Needed | Metric | Satisfies | Priority |
|----|-------|-------------------|--------|-----------|----------|
| R2-C1 | No comparison with Method Y | Run Method Y on datasets A, B | Accuracy, FLOPs | R2-C1, R3-C1 | P0 (blocks acceptance) |
| R1-C2 | Missing ablation for X | Ablate component X | Accuracy delta | R1-C2 | P1 (strengthens case) |

Step 2.4: Build Character Budget

Calculate character allocation based on --char-limit:

10-15% — Global opener (thank reviewers + shared theme resolutions)
75-80% — Per-reviewer responses (proportional to issue count × severity weight: critical=3, major=2, minor=1)
5-10% — Closing (resolved summary + acceptance case)

Order reviewers by priority: detractors first (most to gain), then swing voters, then champions.

Step 2.5: Present Strategy for Confirmation

Write STRATEGY_PLAN.md:

# Response Strategy

## Global Themes (for opener)
1. Scalability: addressed by [approach]
2. Missing ablation: [approach]

## Per-Reviewer Strategy

### R2 (lean_reject → target: neutral+)
| ID | Mode | Angle | Priority |
|----|------|-------|----------|
| R2-C1 | additional_experiment | Run comparison with Y on benchmarks A, B; placeholder until results ready | P0 |
| R2-C2 | nearest_work_delta | Clarify 3 key differences from Z (2024): [diff1], [diff2], [diff3] | P1 |

### R3 (neutral → target: lean_accept)
| ID | Mode | Angle | Priority |
|----|------|-------|----------|
| R3-C1 | additional_experiment | Scale-up experiment on dataset C; shares evidence with R2-C1 | P0 |

### R1 (lean_accept → target: champion)
| ID | Mode | Angle | Priority |
|----|------|-------|----------|
| R1-C1 | direct_clarification | Rewrite Section 3.2 intro paragraph for clarity | P2 |
| R1-C2 | additional_experiment | Ablation study for component X | P1 |

## Character Budget
- Opener: ~600 / 5000 chars
- R2: ~1800 chars (2 issues, 1 critical + 1 major)
- R3: ~1200 chars (1 issue, 1 major)
- R1: ~900 chars (2 issues, 0 critical)
- Closing: ~500 chars
- Total: ~5000 / 5000 limit

--plan-only exit point: If set, present ISSUE_BOARD.md + STRATEGY_PLAN.md + EXPERIMENT_GAPS.md to the user and stop.

Otherwise: Present the strategy plan to the user. Ask: "Does this strategy look right? Adjust any response modes, angles, or priorities before I draft the rebuttal." Wait for confirmation before proceeding to Phase 3.

Phase 3: Draft, Validate & Finalize

Step 3a: Draft Rebuttal

Write REBUTTAL_DRAFT.md following the confirmed strategy plan.

Structure:

Global opener (10-15% of budget)
- Thank all reviewers for their thorough feedback
- Address 2-4 shared themes with concise resolutions
- Set the narrative: what the rebuttal will demonstrate
Per-reviewer responses (75-80% of budget, in priority order) For each issue, follow this pattern:
- Sentence 1: Direct answer to the concern
- Sentences 2-4: Grounded evidence (cite specific paper sections, tables, equations, or numbers)
- Last sentence: Implication — why this strengthens the paper or resolves the concern
- For additional_experiment issues: write full surrounding prose but replace results with [INSERT: description of what goes here, e.g., "accuracy comparison between our method and Method Y on datasets A, B (Table format: Method | Dataset A | Dataset B)"]
Closing (5-10% of budget)
- Summary of what is resolved
- Remaining items (with [INSERT: ...] marked)
- Case for acceptance directed at meta-reviewer

Drafting heuristics:

Evidence > assertion — always cite specific numbers, tables, sections
Global narrative before per-reviewer detail
Name closest prior work + exact delta for novelty disputes
Concede narrowly when reviewer is correct — honest narrow concession > broad denial
Answer champion reviewers too — reinforce their positive framing
Don't argue unwinnable points more than once
For theory: separate core contribution from technical assumptions
Concrete numbers for counter-intuitive claims

Hard rules:

NEVER invent experiments, numbers, derivations, or citations
NEVER promise experiments the user hasn't confirmed
Every claim must trace to: paper content, reviewer's own statement, or [INSERT: ...] placeholder
If no strong evidence exists for a point, say less not more

Step 3b: Automated Lints

Run 5 checks on the draft and write results to LINT_REPORT.md:

Coverage check — For every issue in ISSUE_BOARD.md, verify there is a corresponding response in the draft. Flag any missing issues.
Provenance check — For every factual claim in the draft:
- Claims citing paper sections/tables/equations: verify the referenced section/table exists in the paper
- Claims citing reviewer statements: verify the quote appears in REVIEWS_RAW.md
- [INSERT: ...] placeholders: verify correct formatting
- Other factual claims with no clear source: flag as "needs manual verification"
Tone check — Flag these problematic patterns:
- Aggressive: "the reviewer is wrong", "this is clearly stated", "obviously"
- Submissive: "we apologize", "we are sorry", excessive hedging
- Evasive: changing the subject, answering a different question
- Replace with neutral-professional alternatives
Consistency check — Verify no contradictions across reviewer replies (e.g., telling R1 "we do X" and R2 "we don't do X")
Character count check — Count exact characters in the draft. If over the limit, compress using this priority:
1. Identify and merge duplicate arguments across reviewer responses
2. Remove filler phrases and tighten wording throughout
3. Trim responses to minor issues from champion reviewers
4. Shorten closing section
5. Compress opener to bare essentials
6. NEVER drop responses to critical/major issues
7. If still over after all compression, flag to user: "Cannot fit within limit — need manual cuts"

Step 3c: Stress Test (Adversarial Self-Review)

Re-read the entire draft from the perspective of an adversarial meta-reviewer. Systematically check:

Unanswered concerns — Any issue from ISSUE_BOARD.md that the draft fails to address convincingly?
Unsupported claims — Any factual statement not traceable to paper, review, or [INSERT: ...]?
Risky promises — Any commitment to work the user hasn't confirmed?
Tone problems — Aggressive, defensive, evasive, or submissive passages?
Backfire risk — Which paragraph is most likely to annoy the meta-reviewer? Why?

Write findings to STRESS_TEST.md with a verdict: safe_to_submit | needs_revision.

If needs_revision: apply minimal grounded fixes (no invented evidence), re-run the lint checks, and produce the final version. Maximum 1 revision round. If still problematic after revision, flag remaining issues to user for manual intervention.

Step 3d: Finalize — Two Outputs

Produce two versions:

PASTE_READY.txt — Strict venue-compliant version
- Plain text only (no markdown formatting)
- Exact character count within venue limit
- Ready to paste directly into OpenReview/CMT/HotCRP
- [INSERT: ...] placeholders preserved for user to fill
REBUTTAL_DRAFT_rich.md — Extended version
- Same structure with more detail: fuller explanations, additional evidence, optional paragraphs
- Sections marked [OPTIONAL — cut if over limit] for easy trimming
- Pre-written material for potential follow-up rounds
- Authors read this version, then decide what to keep/cut/rewrite

Present to user with:

Character count of PASTE_READY.txt vs venue limit
Number and list of [INSERT: ...] placeholders that need filling
Any remaining risks from STRESS_TEST.md
Suggested next steps (fill placeholders, run experiments, review rich version)

Follow-up Rounds (`--followup`)

When re-invoked with --followup:

Load state: Read existing output directory (requires ISSUE_BOARD.md, STRATEGY_PLAN.md, REBUTTAL_DRAFT.md at minimum). If directory is missing or incomplete, ask user for the correct path.
Parse new comments: Read the updated reviews file. Identify new reviewer comments that weren't in the original REVIEWS_RAW.md.
Link or create issues: For each new comment:
- If it relates to an existing issue in ISSUE_BOARD.md, link it and update the issue status
- If it's a new concern, create a new issue entry
- If the new comment contradicts a prior rebuttal claim, flag as conflict and ask user for resolution
Draft delta reply: Write responses to new comments only — not a full rewrite. Reference prior rebuttal responses where relevant.
Validate: Re-run lint checks and stress test on the delta reply.
Save: Append to FOLLOWUP_LOG.md with round number and timestamp.

Follow-up rules:

Escalate technically, not rhetorically
Concede if reviewer is right and no new evidence exists
Stop arguing immovable points — answer once and move on
If same issue is re-raised, reference prior response and only add new content if prior response was insufficient

Output Directory

All outputs are saved to: ./output/rebuttal/YYYY-MM-DD-HHMMSS/

./output/rebuttal/YYYY-MM-DD-HHMMSS/
├── REVIEWS_RAW.md           # Verbatim copy of input reviews
├── ISSUE_BOARD.md           # Phase 1: classified concerns + situation assessment
├── STRATEGY_PLAN.md         # Phase 2: per-reviewer response strategy + character budget
├── EXPERIMENT_GAPS.md       # Phase 2: experiments needed with priorities
├── REBUTTAL_DRAFT.md        # Phase 3: working draft
├── REBUTTAL_DRAFT_rich.md   # Phase 3: extended version with optional sections
├── PASTE_READY.txt          # Phase 3: venue-compliant plain text for submission
├── LINT_REPORT.md           # Phase 3: automated lint check results
├── STRESS_TEST.md           # Phase 3: adversarial self-review findings
└── FOLLOWUP_LOG.md          # Follow-up round responses (if --followup)

Best Practices

Provide complete reviews: Copy the full review text from OpenReview including scores and confidence — more context leads to better analysis
Include reviewer IDs: Preserve reviewer numbering from the venue system
Specify the venue: While auto-detection works, explicit --venue ensures correct character limits and format
Use --plan-only first: Review the strategy before committing to a full draft, especially for contentious reviews
Fill placeholders promptly: After running experiments, replace [INSERT: ...] markers with actual results
Review the rich version: REBUTTAL_DRAFT_rich.md contains extra material useful for follow-up rounds
Check character count: Always verify PASTE_READY.txt fits within the venue limit before submitting
Don't over-argue: If the strategy marks something as narrow_concession, trust that framing — conceding gracefully is stronger than arguing weakly

Limitations

Does NOT run experiments — produces placeholders where experimental results are needed
Does NOT edit or upload revised PDFs
Does NOT submit to OpenReview/CMT/HotCRP
Cannot verify claims about unpublished or in-progress work
Novelty assessment for nearest_work_delta depends on knowledge of the field
Character count is approximate until PASTE_READY.txt is generated

Related Skills

paper-reviewing — Generate conference-style reviews (useful for self-review before submission)
paper-polishing — Get ICML meta-review style feedback on drafts
citation-assistant — Find and insert missing citations
literature-survey — Survey related work for novelty defense