copula-modeling - SKILL.md Agent Skill

name: copula-modeling description: > Dependency modeling beyond correlation matrices. Gaussian, Student-t, Clayton, and Gumbel copulas for correlated prediction market outcomes. Tail dependence quantification and vine copulas for high-dimensional portfolios. The reason Gaussian copulas failed in 2008 — and what to use instead. license: MIT allowed-tools: Read Edit Grep Glob Bash metadata: author: lyzr version: "1.0.0" category: quantitative-simulation

Copula Modeling — What Correlation Matrices Can't Capture

When to Use

When modeling joint outcomes across multiple correlated prediction markets:

Swing state election portfolios (PA, MI, WI, GA, AZ)
Correlated policy markets (Fed rate + inflation + employment)
Any portfolio where extreme co-movement matters

The Problem: Tail Dependence

Gaussian copula: Tail dependence lambda_U = lambda_L = 0. Extreme co-movements are modeled as having zero probability. This is catastrophically wrong.

Student-t copula: With nu=4 and rho=0.6, tail dependence is ~0.18. An 18% probability of extreme co-movement. Gaussian says 0%.

Clayton copula: Lower tail dependence only (lambda_L = 2^{-1/theta}). When one market crashes, others follow.

Gumbel copula: Upper tail dependence only (lambda_U = 2 - 2^{1/theta}). Correlated positive resolutions.

Sklar's Theorem

F(x_1, ..., x_d) = C(F_1(x_1), ..., F_d(x_d))

where C is the copula (pure dependency structure) and F_i are the marginal CDFs. Model each market's marginal behavior separately, then glue together with a copula.

Full Implementation

import numpy as np
from scipy.stats import norm, t as t_dist

def simulate_correlated_outcomes_gaussian(probs, corr_matrix, N=100_000):
    """Gaussian copula — no tail dependence."""
    d = len(probs)
    L = np.linalg.cholesky(corr_matrix)
    Z = np.random.standard_normal((N, d))
    X = Z @ L.T
    U = norm.cdf(X)
    outcomes = (U < np.array(probs)).astype(int)
    return outcomes

def simulate_correlated_outcomes_t(probs, corr_matrix, nu=4, N=100_000):
    """Student-t copula — symmetric tail dependence."""
    d = len(probs)
    L = np.linalg.cholesky(corr_matrix)
    Z = np.random.standard_normal((N, d))
    X = Z @ L.T

    S = np.random.chisquare(nu, N) / nu
    T = X / np.sqrt(S[:, None])
    U = t_dist.cdf(T, nu)
    outcomes = (U < np.array(probs)).astype(int)
    return outcomes

def simulate_correlated_outcomes_clayton(probs, theta=2.0, N=100_000):
    """Clayton copula — lower tail dependence (Marshall-Olkin algorithm)."""
    V = np.random.gamma(1 / theta, 1, N)
    E = np.random.exponential(1, (N, len(probs)))
    U = (1 + E / V[:, None])**(-1 / theta)
    outcomes = (U < np.array(probs)).astype(int)
    return outcomes


# --- Compare tail behavior across copulas ---
np.random.seed(42)

probs = [0.52, 0.53, 0.51, 0.48, 0.50]  # 5 swing state probabilities
state_names = ['PA', 'MI', 'WI', 'GA', 'AZ']

corr = np.array([
    [1.0, 0.7, 0.7, 0.4, 0.3],
    [0.7, 1.0, 0.8, 0.3, 0.3],
    [0.7, 0.8, 1.0, 0.3, 0.3],
    [0.4, 0.3, 0.3, 1.0, 0.5],
    [0.3, 0.3, 0.3, 0.5, 1.0],
])

N = 500_000

gauss_outcomes = simulate_correlated_outcomes_gaussian(probs, corr, N)
t_outcomes = simulate_correlated_outcomes_t(probs, corr, nu=4, N=N)

# P(sweep all 5 states)
p_sweep_gauss = gauss_outcomes.all(axis=1).mean()
p_sweep_t = t_outcomes.all(axis=1).mean()

# P(lose all 5 states)
p_lose_gauss = (1 - gauss_outcomes).all(axis=1).mean()
p_lose_t = (1 - t_outcomes).all(axis=1).mean()

# If independent
p_sweep_indep = np.prod(probs)
p_lose_indep = np.prod([1 - p for p in probs])

print("Joint Outcome Probabilities:")
print(f"{'':>25}  {'Independent':>12}  {'Gaussian':>12}  {'t-copula':>12}")
print(f"{'P(sweep all 5)':>25}  {p_sweep_indep:>12.4f}  {p_sweep_gauss:>12.4f}  {p_sweep_t:>12.4f}")
print(f"{'P(lose all 5)':>25}  {p_lose_indep:>12.4f}  {p_lose_gauss:>12.4f}  {p_lose_t:>12.4f}")
print(f"\nt-copula increases sweep probability by {p_sweep_t/p_sweep_gauss:.1f}x vs Gaussian")

The t-copula with nu=4 routinely shows 2–5x higher probability of extreme joint outcomes vs Gaussian. Trading correlated contracts without modeling tail dependence means your portfolio will blow up in exactly the scenarios that matter most.

Vine Copulas for d > 5

For high-dimensional portfolios, bivariate copulas are insufficient. Vine copulas decompose d-dimensional dependency into d(d-1)/2 bivariate conditional copulas in a tree structure:

Type	Structure	Use Case
C-vine (star)	One central event drives all	Presidential winner -> all policy markets
D-vine (path)	Sequential dependencies	Primary results -> general election
R-vine (general)	Maximum flexibility	Complex multi-market portfolios

Construction: Build maximum spanning trees ordered by |tau_Kendall|, select pair-copula families via AIC, estimate sequentially.

Libraries: pyvinecopulib (Python), VineCopula (R).

Copula Selection Guide

Scenario	Copula	Why
Symmetric moderate correlation	Gaussian	Simple, fast, no tail dependence
Symmetric with fat tails	Student-t (nu=3-6)	Captures joint extremes
Crash contagion	Clayton (theta=1-5)	Lower tail dependence only
Joint positive resolution	Gumbel (theta=1.5-4)	Upper tail dependence only
Complex asymmetric	Frank + rotation	Flexible, no tail dependence
d > 5 contracts	Vine copula	Pair-copula decomposition