name: scoring-calibration description: > Skill for venue-calibrated scoring, score weighting formulas, decision rules, anti-bias mechanisms, and score interpretation across different venue tiers.
Scoring Calibration
Use this skill when computing review scores, applying decision rules, or calibrating review standards to a specific venue.
Score Dimensions
Every review scores these 6 dimensions plus confidence:
| Dimension | Range | Description |
|---|---|---|
| Overall | 1-10 | Holistic assessment |
| Soundness | 1-10 | Technical correctness |
| Novelty | 1-10 | Originality of contribution |
| Clarity | 1-10 | Writing and presentation quality |
| Significance | 1-10 | Impact and importance |
| Reproducibility | 1-10 | Can results be reproduced? |
| Confidence | 1-5 | Reviewer's self-assessed expertise |
Venue-Calibrated Interpretation
Top-Tier (NeurIPS, Nature, Science, ICML)
| Score | Meaning |
|---|---|
| 8-10 | Strong accept — top 10% of submissions |
| 6-7 | Weak accept — above threshold, some issues |
| 5 | Borderline — could go either way |
| 3-4 | Weak reject — below threshold, significant issues |
| 1-2 | Strong reject — fundamental flaws |
Acceptance threshold: Mean ≥ 7, no critical issues
Mid-Tier (AAAI, ECML, PLOS ONE)
| Score | Meaning |
|---|---|
| 7-10 | Strong accept |
| 5-6 | Accept with revisions |
| 4 | Borderline |
| 2-3 | Reject |
| 1 | Strong reject |
Acceptance threshold: Mean ≥ 6, critical issues addressed
Workshop / Preprint
| Score | Meaning |
|---|---|
| 6-10 | Accept |
| 4-5 | Accept with minor revisions |
| 3 | Borderline |
| 1-2 | Reject |
Acceptance threshold: Mean ≥ 5, no fatal flaws
Score Weighting Formula
The weighted final score combines dimensions with fixed weights:
final_score = (
0.30 × mean(soundness) +
0.20 × mean(novelty) +
0.20 × mean(significance) +
0.15 × mean(clarity) +
0.15 × mean(reproducibility)
)
These weights can be overridden in .review-config.yaml:
review:
score_weights:
soundness: 0.30
novelty: 0.20
significance: 0.20
clarity: 0.15
reproducibility: 0.15
Decision Rules
| Condition | Decision |
|---|---|
| All reviewers ≥ 7, no critical weaknesses | Accept |
| All reviewers ≥ 6, only minor weaknesses | Accept with Minor Revision |
| Mean ≥ 5, no more than 1 reviewer below 5 | Major Revision |
| Mean < 5 or 2+ reviewers below 4 | Reject |
| Strong disagreement (spread ≥ 4 points) | Discussion round before decision |
Venue-Adjusted Thresholds
The decision rules above use venue-specific thresholds:
| Rule Parameter | Top-Tier | Mid-Tier | Workshop |
|---|---|---|---|
| Accept threshold | ≥ 7 | ≥ 6 | ≥ 5 |
| Accept-minor threshold | ≥ 6 | ≥ 5 | ≥ 4 |
| Major revision threshold | ≥ 5 | ≥ 4 | ≥ 3 |
| Reject threshold | < 5 | < 4 | < 3 |
Anti-Bias Mechanisms
Anchoring Prevention
- Reviewers assign scores BEFORE writing detailed comments
- Score-first protocol prevents narrative from biasing quantitative assessment
Confirmation Bias Mitigation
- Reviewer γ (Generalist) has no domain priors — provides an unbiased perspective
- If all reviews are uniformly positive (all ≥ 7), flag for confirmation bias check
Authority Bias Prevention
- Author identity optionally stripped in double-blind mode
- Reviewer profiles focus on expertise, not prestige
Positivity Bias Prevention
- EIC prompt emphasizes that rejection is a valid and useful outcome
- Decision rules explicitly model rejection conditions
Novelty Bias Prevention
- Score weights rank soundness (0.30) above novelty (0.20)
- A technically correct but incremental paper scores higher than a novel but unsound one
Score Trajectory Tracking
Track scores across revision rounds to detect convergence or stalling:
score_trajectory:
round_1:
alpha: 5
beta: 7
gamma: 6
mean: 6.0
weighted: 5.95
round_2:
alpha: 7
beta: 8
gamma: 7
mean: 7.3
weighted: 7.25
delta: +1.3
convergence_status: "improving"
Diminishing Returns
If delta ≤ 0.3 for 2 consecutive rounds:
→ Flag DIMINISHING_RETURNS
→ Consider declaring EXHAUSTED