name: extremizing-forecasts description: Adjust aggregated probability forecasts upward when diverse forecasters converge, compensating for crowd conservatism
Extremizing Forecasts
Overview
Extremizing is a statistical technique from Philip Tetlock's superforecasting research that compensates for the conservatism bias in aggregated probability forecasts. When multiple independent forecasters converge on similar probabilities, their collective wisdom is typically underconfident - if everyone says 66%, the real probability is likely higher. Extremizing pushes aggregate forecasts away from 50% toward the extremes (0% or 100%) based on forecaster diversity and agreement levels.
Developed through the Good Judgment Project (2011-2015), extremizing uses a log-odds algorithm to systematically adjust crowd predictions. The technique doesn't apply equally to all forecasts - it's most powerful when applied to diverse crowds with information asymmetry, and least necessary for expert "superforecaster" teams who already share common knowledge.
When to Use
- Aggregating mass forecasts: Combining predictions from large, diverse crowd (100+ forecasters)
- Tournament competitions: When you need maximum accuracy from pooled predictions
- Forecaster diversity exists: Groups with varied backgrounds, information access, methods
- Convergence observed: Independent forecasters arriving at similar probabilities
- Not for superforecasters: Skip extremizing when elite forecasters already collaborate
The Process
Step 1: Collect Independent Forecasts
Gather probability estimates from multiple forecasters on the same question. Ensure independence - forecasters shouldn't coordinate before submitting.
Example question: "Will Company X's stock price exceed $150 by Dec 31?"
Raw forecasts from 50 forecasters:
- 20 forecasters: 65%
- 15 forecasters: 70%
- 10 forecasters: 60%
- 5 forecasters: 75%
Average: 67%
Step 2: Calculate Simple Aggregate
Compute the baseline aggregate using mean, median, or weighted average (weight by past accuracy).
Weighted by Brier score performance:
- Top performers (5 forecasters at 75%): 2x weight
- Average performers: 1x weight
- Result: 68% weighted average
Step 3: Assess Forecaster Diversity
Evaluate information diversity using two signals:
- Spread of estimates: Wide spread (40%-90%) = high diversity, narrow spread (65%-70%) = low diversity
- Knowledge overlap: Do forecasters share common sources/methods?
Diversity score:
- High diversity (independent info sources) = extremize more aggressively
- Low diversity (everyone reads same news) = minimal extremizing
- Zero diversity (clones/teammates) = no extremizing
Step 4: Apply Extremizing Algorithm
Use log-odds transformation to push forecast away from 50%:
Log-odds formula:
- Convert probability to odds: p/(1-p)
- Take logarithm: log(odds)
- Multiply by extremizing factor (typically 1.2-1.5)
- Convert back to probability
Example with 68% aggregate, extremizing factor 1.3:
- Odds: 68/32 = 2.125
- Log-odds: log(2.125) = 0.754
- Extremized log-odds: 0.754 × 1.3 = 0.980
- Extremized odds: exp(0.980) = 2.664
- Extremized probability: 2.664/(1 + 2.664) = 73%
Result: Original 68% becomes 73% after extremizing.
Step 5: Calibrate Extremizing Strength
Adjust extremizing factor based on:
- Agreement level: High convergence (narrow range) = stronger extremizing
- Track record: If past extremizing improved accuracy, increase factor
- Question type: Binary outcomes vs. continuous values
- Time horizon: Near-term vs. long-term forecasts
Good Judgment Project findings:
- Regular teams: extremizing factor 1.2-1.5 optimal
- Superforecaster teams: extremizing factor ~1.0 (no adjustment needed)
- Mass crowds: extremizing can boost accuracy by 10-20%
Step 6: Validate Against Outcomes
Track extremized forecasts vs. raw aggregates using Brier scores (lower is better).
Brier score formula: Average of (forecast - outcome)²
Example results:
- Raw aggregate (68%): Brier = 0.18
- Extremized (73%): Brier = 0.14 (20% improvement)
Common Pitfalls
Over-extremizing superforecasters - Elite teams with shared knowledge don't benefit from extremizing. They're already at optimal confidence levels.
Extremizing small samples - Need 20+ forecasters for statistical validity. With 5 forecasters, extremizing adds noise.
Ignoring herding - If forecasters see each other's predictions, they're not independent. Extremizing amplifies groupthink.
Fixed extremizing factor - Optimal factor varies by question type, forecaster pool, time horizon. Test and calibrate.
Extremizing outliers - Remove statistical outliers (>3 standard deviations) before extremizing, or they'll distort the adjustment.
Real-World Applications
Good Judgment Project (2011-2015): Extremizing regular forecaster teams boosted them past some superforecaster teams in IARPA tournament accuracy rankings.
Prediction markets alternative: Tetlock's team showed extremized prediction polls outperformed prediction markets when using temporal decay, differential weighting, and recalibration.
Intelligence community: IARPA adopted extremizing for aggregating analyst forecasts on geopolitical events.
Financial markets: Hedge funds apply extremizing to analyst consensus estimates when dispersion is low but conviction is high.
Key Insights
Extremizing works because crowds are systematically underconfident. When diverse forecasters independently arrive at 70%, they're hedging uncertainty by staying closer to 50%. But their convergence is itself a signal - if people with different information reach similar conclusions, truth is likely more extreme.
The technique only applies when forecasters have information asymmetry. A team with zero diversity (clones who know everything each other knows) should never be extremized - they're already optimally calibrated. Superforecaster teams approach this ideal, which is why extremizing doesn't help them much.
When extremizing works best: Apply to mass forecasts from diverse crowds. Extremizing brings regular crowds almost to parity with superforecaster accuracy in many cases.
When to skip extremizing: Superforecaster teams, small samples, herding/coordination, purely random forecasts.