numeric-forecast - SKILL.md Agent Skill

name: numeric-forecast description: Submission format for numeric forecasting questions with critical LOG SCORING warnings about overconfidence. Use when submitting numeric/continuous question forecasts.

Numeric Forecast Submission

CRITICAL - LOG SCORING WARNING

Metaculus uses LOG SCORING which severely punishes overconfidence:

If you assign 1% probability to outcome that happens, your score is HEAVILY penalized
Better to be slightly too uncertain than too confident

Percentile Meanings

1st percentile: Value you're 99% confident answer is ABOVE
50th percentile: Your central estimate (median)
99th percentile: Value you're 99% confident answer is BELOW
10th-90th spread: Should contain ~80% of probability mass

Anti-Overconfidence Check

Before submitting, ask yourself: "What if my central estimate is completely wrong?"

DON'T cluster values in a tiny range
DO spread across realistic possibilities
Make your 10th-90th percentile range 20-40% of the question's total range

Required Format

percentiles: Dict with STRING keys (all 15 required, STRICTLY INCREASING): "1", "5", "10", "20", "25", "30", "40", "50", "60", "70", "75", "80", "90", "95", "99"

CRITICAL VALIDATION RULES (Common errors that will cause REJECTION):

❌ NO DUPLICATE VALUES: Each percentile must have a DIFFERENT value
- Bad: {"1": 100, "5": 100, "10": 100} ← VALUES MUST DIFFER!
- Good: {"1": 100, "5": 102, "10": 105} ← Each strictly greater
❌ STRICTLY INCREASING: Each value must be > previous value (not >=)
- Bad: {"40": 50, "50": 50, "60": 55} ← 50 = 50 is NOT > !
- Good: {"40": 48, "50": 50, "60": 52} ← Each value increases
✓ ALL 15 KEYS REQUIRED: Must include every single percentile
- Don't use "p1", "p5" - use "1", "5" as STRING keys
- Can't skip keys like 20, 30, 40
✓ USE NUMERIC VALUES: Values should be numbers (floats/ints), not strings
- Bad: {"1": "100", "5": "105"} ← Don't quote values
- Good: {"1": 100, "5": 105} ← Plain numbers

Before submitting, verify:

Count: Do you have exactly 15 percentile keys?
Unique: Are all 15 values DIFFERENT from each other?
Increasing: Is each value strictly GREATER than the one before it?

Example

# For a question with range 0-100
mcp__forecaster__submit_forecast_numeric(
    percentiles={
        "1": 20,   # 99% confident above this
        "5": 30,
        "10": 35,
        "20": 40,
        "25": 42,
        "30": 44,
        "40": 47,
        "50": 50,  # Median estimate
        "60": 53,
        "70": 56,
        "75": 58,
        "80": 60,
        "90": 65,
        "95": 70,
        "99": 80   # 99% confident below this
    },
    reasoning_summary="Base rate 50, historical volatility suggests 10th-90th range of 35-65"
)

Bad Example (too narrow, will be penalized):

percentiles={
    "1": 49, "5": 49.5, "10": 49.7, ..., "99": 50.3  # ❌ TOO TIGHT!
}