superforecaster

name: superforecaster description: > Calibrated forecasting for real-world future events, plus prediction market checks. Use when the user wants the probability of a future outcome in politics, economics, technology, policy, or world events. Triggers: "what's the probability of X?", "will X happen?", "how likely is Y?", "what are the odds of X?", "check prediction markets", "what do Metaculus/Manifold/Polymarket say about", "give me calibrated odds". Best for binary or date-bounded questions about external events. Also searches Metaculus, Manifold, PredictIt, Polymarket, Kalshi, Betfair, and Smarkets for current market-implied probabilities. Do NOT use for sports betting odds, current asset prices, numeric time-series forecasting from CSV data (use forecast skill), or internal predictions like whether code, tests, or builds will pass. allowed-tools: WebSearch, WebFetch, Bash(curl *), Task

Generate calibrated probabilistic forecasts on binary questions. Based on Halawi et al. (NeurIPS 2024) and the Metaculus forecasting-tools framework.

Quick Lookup

Need	Action
Full calibrated forecast on a binary question	Follow the Pipeline below
Just check current prediction market odds	See check-odds.md

Pipeline

For each question, execute these steps in order:

1. Frame the Question

Restate the question as a precise binary (Yes/No) with:

Resolution criteria: What exactly counts as Yes?
Resolution date: When will we know?
Base rate: How often do events like this happen? Search for historical precedent.

If the user's question is vague, clarify before forecasting.

2. Gather Evidence

Run in parallel:

a) Market prior — See check-odds.md to get prediction market consensus. This is your initial anchor. If no markets exist, note that explicitly.

b) Web research — Use WebSearch for 3-5 targeted queries:

The question itself
Key sub-questions that would shift the probability
Recent developments / status quo

For each result, assess relevance (skip paywalled, error pages, stale content). Summarize each source as bullet points preserving dates, numbers, and quotes.

c) Base rate search — Search for the historical frequency of similar events. "How often has [category of event] happened in [relevant timeframe]?"

3. Reason (Structured Scratchpad)

Write out your reasoning following this exact structure:

TIME REMAINING: [duration until resolution]

STATUS QUO: [what happens if nothing changes — this is the most likely outcome]

SCENARIO FOR NO:
- [concrete pathway to No]
- Strength: [weak/moderate/strong]

SCENARIO FOR YES:
- [concrete pathway to Yes]
- Strength: [weak/moderate/strong]

BASE RATE: [X% — historical frequency of similar events]

MARKET CONSENSUS: [X% from check-odds.md, or "no markets"]

KEY EVIDENCE:
- [evidence point 1 — for/against, strength]
- [evidence point 2 — for/against, strength]
- [evidence point 3 — for/against, strength]

INITIAL PROBABILITY: X%

Critical debiasing rules:

Status quo bias is correct. The world changes slowly. Put extra weight on the status quo outcome. LLMs over-predict dramatic change.
Clamp to [2%, 98%]. Never assign extreme certainty. You don't know what you don't know.
Anchor on base rates. Start from how often this type of event happens, then update based on specific evidence.
Don't anchor on round numbers. 73% is more calibrated than 75%.

4. Critique and Adjust

After your initial probability, challenge it:

Is this excessively confident or not confident enough?
What am I missing? What would change my mind?
Am I anchoring too heavily on one piece of evidence?
Does my probability reflect the time remaining? (More time = more uncertainty = closer to base rate)
Am I being swayed by a dramatic narrative over boring base rates?

Adjust if warranted. State what changed and why.

5. Multi-Model Aggregation (Optional, for high-stakes questions)

For important forecasts, get independent predictions from other models using discussion-partners:

Frame the question with ALL context (the partner has zero context). Include:

The exact binary question and resolution criteria
Resolution date
All evidence gathered in step 2
The base rate

Ask: "You are a professional superforecaster interviewing for a job. Given this evidence, what is the probability of [question]? Walk through your reasoning, then output your final answer as 'Probability: ZZ%'."

Query 2-3 models. Aggregate via median (robust to outliers).

If your solo forecast and the multi-model median diverge by >15 percentage points, investigate why. The disagreement is informative — don't just average it away.

6. Output

## Forecast: [question]

**Probability: X%**

### Evidence Summary
- [key evidence, 3-5 bullets]

### Reasoning
[2-3 sentences on the key drivers]

### Confidence Notes
- Market consensus: [X% or "no markets found"]
- Base rate: [X%]
- Multi-model range: [X%-Y%] (if step 5 was used)
- Key uncertainty: [what could most change this forecast]

### Sources
- [source 1]
- [source 2]

Calibration Principles

These come from Tetlock's superforecasting research and Metaculus tournament winners:

Status quo is king. Most questions resolve to the boring default. Overweight it.
Update incrementally. Each piece of evidence should move your probability a little, not a lot. Bayesian updates, not vibes.
Distinguish confidence from extremity. High confidence in an uncertain outcome is still ~50%. Confidence means your uncertainty estimate is precise, not that the probability is extreme.
Time horizon matters. More time = more variance = probabilities closer to base rates. Less time = more certainty = probabilities can be more extreme.
Aggregate > individual. Multiple independent forecasts averaged together beat any single forecast. Use markets and multi-model when available.
Beware narrative bias. Compelling stories feel more probable than they are. Boring statistical regularity beats vivid anecdote.