healthcare-predictive-modeling - SKILL.md Agent Skill

name: healthcare-predictive-modeling description: Build, tune, and validate supervised predictive models on healthcare data risk of readmission, cost, disease onset, no-show, denial. Emphasizes ML fundamentals done right train/validation/test and cross-validation design, the bias-variance tradeoff, regularization (L1/L2/early stopping), loss-function choice, class imbalance, probability calibration, and leakage-safe, point-in-time feature engineering. Covers scikit-learn, XGBoost/LightGBM, and PyTorch/TensorFlow. Use to develop a robust, well-evaluated clinical or operational predictive model. keywords: - predictive modeling - machine learning - bias-variance - regularization - cross-validation - calibration - xgboost - class imbalance - risk prediction - healthcare license: MIT metadata: author: MedClawMini version: "1.0.0" compatibility: - OpenClaw allowed-tools: - run_shell_command - web_fetch

Healthcare Predictive Modeling

Overview

This skill builds predictive models the right way with the statistical-learning discipline that separates a model that demos well from one that holds up in production. The emphasis is on methodology (validation design, bias-variance management, regularization, calibration, leakage control) as much as on algorithms, because in healthcare a miscalibrated or leaky model is worse than none.

When to Use This Skill

Predicting readmission, total cost, disease onset/progression, appointment no-show, or claim denial.
Any supervised tabular ML problem where you need defensible evaluation.
Turning the feature tables from spark-healthcare-data-pipeline into a scored model.
A model that downstream needs explanations (explainable-ml-healthcare) and validation (ml-model-validation-regulatory).

Method (ML fundamentals, applied)

Frame & split define label and prediction time; split by patient and time (group + temporal split) so no patient/future leaks across folds.
Leakage-safe features only information available at prediction time; fit all transforms inside the CV fold (pipelines, not pre-computed scalers).
Baseline → complex start with regularized logistic regression, then gradient- boosted trees (XGBoost/LightGBM), then deep nets only if warranted.
Bias-variance read train-vs-validation gaps and learning curves; high variance → more regularization/data; high bias → richer features/model.
Regularization L1/L2, tree depth/min-child-weight, early stopping; tune with nested or grouped CV (never tune on test).
Imbalance & loss pick the loss/metric for the cost structure (log-loss, focal, class weights, scale_pos_weight); resample thoughtfully.
Calibration calibrate probabilities (Platt/isotonic) and check a reliability curve clinical decisions use the probability, not just the rank.
Evaluate honestly AUROC and AUPRC for rare events, calibration, and decision- curve/net-benefit; report subgroup performance for fairness.

Example

from sklearn.model_selection import StratifiedGroupKFold
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import roc_auc_score, average_precision_score
import xgboost as xgb, numpy as np

cv = StratifiedGroupKFold(n_splits=5)                 # group=member_id prevents leakage
pipe = Pipeline([("scale", StandardScaler()),
                 ("clf", LogisticRegression(penalty="l2", C=1.0, max_iter=1000))])

aucs, aps = [], []
for tr, va in cv.split(X, y, groups):
    pipe.fit(X[tr], y[tr]); p = pipe.predict_proba(X[va])[:,1]
    aucs.append(roc_auc_score(y[va], p)); aps.append(average_precision_score(y[va], p))
print(f"AUROC {np.mean(aucs):.3f}  AUPRC {np.mean(aps):.3f}")

booster = xgb.XGBClassifier(max_depth=4, n_estimators=400, learning_rate=0.05,
            subsample=0.8, reg_lambda=1.0, scale_pos_weight=(y==0).sum()/(y==1).sum())
cal = CalibratedClassifierCV(booster, method="isotonic", cv=5).fit(X, y)  # calibrated probs

Outputs

Serialized model + preprocessing pipeline (leakage-safe, reproducible).
model_card.md data, intended use, metrics (AUROC/AUPRC/calibration), subgroup performance, limitations.
cv_results.csv fold metrics, learning curves, hyperparameter search.
Calibrated probability scorer ready for explainable-ml-healthcare.

Healthcare Context

Encodes the healthcare-specific failure modes: target leakage from post-outcome codes, patient-level (not row-level) splits, severe class imbalance, and the need for calibrated probabilities and subgroup fairness. Deep-learning frameworks (PyTorch/TensorFlow/Keras) slot in for text/image/sequence inputs; the validation discipline is identical.

References

Hastie, Tibshirani & Friedman, Elements of Statistical Learning.
scikit-learn model evaluation https://scikit-learn.org/stable/model_selection.html
Van Calster et al. (2019), calibration of clinical prediction models; TRIPOD guidance.