name: WHL Update Report description: Procedures for updating and formatting FINAL_REPORT.md. Covers all Phase 1 competition deliverables and academic writing standards.
WHL Report Update Skill
This skill governs the update and formatting of FINAL_REPORT.md — the primary competition deliverable.
Report Structure (Required)
FINAL_REPORT.md
├── Header (Title, Date, Team)
├── Abstract (dataset size, tools, key outcomes)
├── Section 1: Introduction & Epistemological Framework
├── Section 2: Data Provenance & Exploratory Diagnostics
│ ├── 2.1 Environmental Calibration (home ice advantage)
│ └── 2.2 Data Quality & EDA Findings
├── Section 3: Mathematical Architectures
│ ├── 3.1 Baseline Models (Colley, BT, Elo, Pythagorean, etc.)
│ └── 3.2 CLAUDE CODE MODELS (new custom models — document with full math)
├── Section 4: Validation & Out-of-Sample Testing
│ ├── 4.1 Validation Metrics Table (all models)
│ └── 4.2 Win Probability Cross-Validation
├── Section 5: Model-Specific Results (all 10+ models)
├── Section 6: CLAUDE CODE MODEL Results (custom models)
├── Section 7: Final Power Rankings (consensus, 32 teams)
├── Section 8: Playoff Matchup Predictions (16 games)
├── Section 9: Line Disparity Analysis
│ ├── 9.1 Top-10 Teams by Disparity
│ └── 9.2 Disparity vs. Team Strength Relationship
└── Section 10: Phase 1d Methodology Summary (competition submission text)
├── 10.1 Process (~50 words)
├── 10.2 Tools and Techniques (~100 words)
├── 10.3 Predictions methodology (~50 words each 1a, 1b, 1c)
└── 10.4 Insights (model performance + AI usage)
Formatting Standards
Tables
Always use GitHub-flavored markdown tables with column alignment:
| Rank | Team | Score | Metric_1 | Metric_2 |
|-----:|:---------|-------:|---------:|---------:|
| 1 | Thailand | 0.6053 | 294.737 | 221.662 |
Math notation
Use LaTeX-style inline notation for equations:
$P(i > j) = p_i / (p_i + p_j)$$xGD/60 = (xGF - xGA) \times 3600 / TOI$
Section headers
Use ## for top-level sections, ### for subsections, #### for sub-subsections.
CLAUDE CODE MODEL documentation
Each custom model must be documented with:
- Full model name following naming convention
- Mathematical formulation
- Motivation (why this adds value beyond existing models)
- Results table
- Validation metrics (at least 7)
- Key findings / interpretation
Data Sources for Report Updates
Pull from these files to update report tables:
# Power rankings
consensus = pd.read_csv('outputs/rankings/consensus_rankings.csv')
# Columns: team, consensus_rank, mean_rank, rank_variance
# Win probabilities
win_probs = pd.read_csv('outputs/win_probabilities.csv')
# Columns: game, home_team, away_team, p_lr, p_elo, p_bt, p_log5, p_mc, p_rf, p_svm, p_ensemble
# Validation scores
validation = pd.read_csv('outputs/validation_scores.csv')
# Columns: model_name, kendall_tau, spearman_rho, top8_hit_rate, brier_score, log_loss, rank_inversion_rate, consensus_rho
# Line disparity
disparity = pd.read_csv('outputs/disparity/consensus_disparity.csv')
# Columns: team, mean_rank, consensus_rank
Phase 1d Methodology Text Template
Process (~50 words):
"Raw shift-level data (25,827 rows) was aggregated to game-level and team-level summaries. All counting metrics were normalized to per-60-minute rates to control for time-on-ice variation. Even-strength and special-teams situations were isolated. Additional derived variables included xG Differential/60, Pythagorean win percentage, game-by-game Elo trajectories, and line-level disparity ratios."
Tools (~50 words):
"Python (pandas, numpy, scipy, sklearn) was used for all analysis. Claude Code (Anthropic) served as the primary AI orchestration layer, spawning and coordinating subagents for parallel model development, validation, and report generation. All code is reproducible with fixed random seeds."
Statistical Methods (~100 words):
"We employed ten distinct statistical architectures: (1) raw points standings, (2) xG differential per 60, (3) Pythagorean expectation with optimal-k tuning, (4) chronological Elo ratings with K=20, (5) Colley Matrix schedule adjustment, (6) Bradley-Terry maximum likelihood, (7) composite weighted ensemble, (8) logistic regression, (9) Random Forest, and (10) Monte Carlo Poisson simulation. Additionally, three CLAUDE CODE MODELS extended this suite: a Strength-of-Schedule adjusted win percentage, a Dixon-Coles bivariate Poisson model, and a Bayesian rank aggregation. Validation used Kendall's τ, Spearman's ρ, Brier scores, log-loss, and 10-fold cross-validation."
Report Update Checklist
Before finalizing FINAL_REPORT.md:
- All 32 teams appear in power rankings
- All 16 Round 1 matchups have win probabilities
- At least 10 models documented with validation metrics
- CLAUDE CODE MODELS documented with full math and results
- Top-10 disparity teams listed correctly
- Phase 1d methodology text meets word count targets
- All tables properly formatted with alignment
- No raw Python output or error messages in report
- Abstract mentions data size (25,827 rows, 1,312 games, 32 teams)