clinical-assessment - SKILL.md Agent Skill

name: clinical-assessment description: > Use this Skill to score and interpret clinical scales: PHQ-9, GAD-7, PCL-5, reliable change index (RCI), norm comparison, and longitudinal change visualization. tags:

psychology
clinical-assessment
PHQ-9
PCL-5
reliable-change-index
questionnaire version: "1.0.0" authors:
name: awesome-rosetta-skills contributors github: "@xjtulyc" license: "MIT" platforms:
claude-code
codex
gemini-cli
cursor dependencies: python:
- pandas>=1.5
- numpy>=1.23
- matplotlib>=3.6
- scipy>=1.9 last_updated: "2026-03-18" status: stable

Clinical Assessment Scoring and Interpretation

TL;DR — Batch-score PHQ-9, GAD-7, PCL-5, BDI-II, and SWLS questionnaires. Compute the Reliable Change Index (RCI) for clinically significant change, handle missing items with prorated scoring, compare against published norms, and generate longitudinal trajectory visualizations.

When to Use

Use this Skill when you need to:

Score clinical questionnaires from raw item responses in a data frame
Apply standard severity cutoffs (minimal, mild, moderate, severe)
Determine whether pre-post change is statistically reliable (RCI)
Classify individuals as recovered, improved, unchanged, or deteriorated
Compute prorated scores when 1–2 items are missing
Generate per-participant longitudinal plots showing clinical trajectories
Compare scores to published normative samples (z-score lookup)

Background

Scoring Rules Summary

Scale	Items	Range	Cutoffs
PHQ-9	9 items, 0–3 each	0–27	0–4 minimal, 5–9 mild, 10–14 moderate, 15–27 severe
GAD-7	7 items, 0–3 each	0–21	0–4 minimal, 5–9 mild, 10–14 moderate, 15–21 severe
PCL-5	20 items, 0–4 each	0–80	≥ 33 provisional PTSD
BDI-II	21 items, 0–3 each	0–63	0–13 minimal, 14–19 mild, 20–28 moderate, 29–63 severe
SWLS	5 items, 1–7 each	5–35	5–9 extremely dissatisfied, ≥31 extremely satisfied

PCL-5 DSM-5 Cluster Subscores

Cluster	Items (1-indexed)	Symptom Group
B (Intrusion)	1–5	Re-experiencing
C (Avoidance)	6–7	Avoidance
D (Neg cognition)	8–14	Negative alterations in cognition/mood
E (Hyperarousal)	15–20	Alterations in arousal/reactivity

Reliable Change Index (RCI)

Jacobson & Truax (1991):

SE_diff = SD_pre × √(2) × √(1 − r_tt)
RCI = (post_score − pre_score) / SE_diff

Where r_tt is the test-retest reliability of the scale. Reliable change: |RCI| ≥ 1.96 (two-tailed, α = .05).

Clinical significance requires BOTH reliable change AND movement from the dysfunctional distribution (above cutoff) to the functional distribution (below cutoff).

Environment Setup

conda create -n clinical_env python=3.11 -y
conda activate clinical_env

pip install pandas>=1.5 numpy>=1.23 matplotlib>=3.6 scipy>=1.9

python -c "import pandas, numpy, matplotlib, scipy; print('All OK')"

Core Workflow

Step 1 — Questionnaire Scoring

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
from scipy import stats
from typing import Optional, Dict, List, Tuple, Union

# ── Published reliability coefficients (test-retest r_tt) ──────────────────
SCALE_RELIABILITY = {
    "PHQ-9":  0.84,  # Kroenke et al., 2001
    "GAD-7":  0.83,  # Spitzer et al., 2006
    "PCL-5":  0.82,  # Blevins et al., 2015
    "BDI-II": 0.93,  # Beck et al., 1996
    "SWLS":   0.82,  # Diener et al., 1985
}

# Published SD for RCI computation (normative or clinical samples)
SCALE_NORM_SD = {
    "PHQ-9":  7.1,
    "GAD-7":  5.6,
    "PCL-5":  22.0,
    "BDI-II": 12.7,
    "SWLS":   6.4,
}

# Severity cutoffs: list of (max_score_inclusive, label) tuples
SEVERITY_CUTOFFS = {
    "PHQ-9":  [(4, "minimal"), (9, "mild"), (14, "moderate"), (27, "severe")],
    "GAD-7":  [(4, "minimal"), (9, "mild"), (14, "moderate"), (21, "severe")],
    "PCL-5":  [(32, "below_threshold"), (80, "provisional_PTSD")],
    "BDI-II": [(13, "minimal"), (19, "mild"), (28, "moderate"), (63, "severe")],
    "SWLS":   [(9, "extremely_dissatisfied"), (14, "dissatisfied"),
               (19, "slightly_below_avg"), (24, "average"),
               (29, "high"), (34, "very_high"), (35, "extremely_satisfied")],
}

# Clinical cutoffs for RCI clinical significance classification
CLINICAL_CUTOFF = {
    "PHQ-9":  10,   # moderate threshold
    "GAD-7":  10,
    "PCL-5":  33,
    "BDI-II": 20,
    "SWLS":   20,   # below average
}


def apply_severity_cutoff(score: float, scale: str) -> str:
    """Apply severity classification based on published cutoffs."""
    cutoffs = SEVERITY_CUTOFFS.get(scale, [])
    for max_val, label in cutoffs:
        if score <= max_val:
            return label
    return "unknown"


def score_phq9(
    df: pd.DataFrame,
    item_cols: Optional[List[str]] = None,
    id_col: str = "participant_id",
    allow_missing: int = 1,
) -> pd.DataFrame:
    """
    Score PHQ-9 depression questionnaire.

    Items are scored 0 (not at all) to 3 (nearly every day).
    Prorated scoring: if ≤ allow_missing items are missing, impute with
    the mean of valid items × 9.

    Args:
        df:           DataFrame with one row per participant.
        item_cols:    List of 9 PHQ-9 item column names (in order).
                      Defaults to ['phq1',...,'phq9'].
        id_col:       Participant ID column.
        allow_missing: Max missing items allowed for prorated scoring.

    Returns:
        DataFrame with PHQ-9 total, severity, and item-level flags.
    """
    if item_cols is None:
        item_cols = [f"phq{i}" for i in range(1, 10)]

    assert len(item_cols) == 9, "PHQ-9 requires exactly 9 item columns."

    result = df[[id_col]].copy() if id_col in df.columns else df.iloc[:, :1].copy()
    result.columns = [id_col]

    scores = []
    severities = []
    n_missing_list = []

    for _, row in df.iterrows():
        item_vals = row[item_cols]
        n_valid = item_vals.notna().sum()
        n_missing = 9 - n_valid

        if n_missing > allow_missing:
            total = np.nan
            severity = "missing"
        elif n_missing == 0:
            total = item_vals.sum()
            severity = apply_severity_cutoff(total, "PHQ-9")
        else:
            # Prorated: (sum of valid items / n_valid) × 9
            total = round((item_vals.sum() / n_valid) * 9)
            severity = apply_severity_cutoff(total, "PHQ-9")

        scores.append(total)
        severities.append(severity)
        n_missing_list.append(n_missing)

    result["PHQ9_total"] = scores
    result["PHQ9_severity"] = severities
    result["PHQ9_n_missing"] = n_missing_list
    result["PHQ9_item9_suicidal"] = df[item_cols[8]].values  # item 9 = suicidality

    return result


def score_gad7(
    df: pd.DataFrame,
    item_cols: Optional[List[str]] = None,
    id_col: str = "participant_id",
    allow_missing: int = 1,
) -> pd.DataFrame:
    """
    Score GAD-7 anxiety questionnaire (7 items, 0–3 each, range 0–21).

    Args:
        df:           DataFrame with one row per participant.
        item_cols:    List of 7 GAD-7 item column names.
        id_col:       Participant ID column.
        allow_missing: Max missing items for prorated scoring.

    Returns:
        DataFrame with GAD-7 total and severity classification.
    """
    if item_cols is None:
        item_cols = [f"gad{i}" for i in range(1, 8)]

    assert len(item_cols) == 7

    result = df[[id_col]].copy() if id_col in df.columns else df.iloc[:, :1].copy()
    result.columns = [id_col]

    scores, severities, n_missing_list = [], [], []

    for _, row in df.iterrows():
        vals = row[item_cols]
        n_valid = vals.notna().sum()
        n_missing = 7 - n_valid

        if n_missing > allow_missing:
            total, severity = np.nan, "missing"
        elif n_missing == 0:
            total = vals.sum()
            severity = apply_severity_cutoff(total, "GAD-7")
        else:
            total = round((vals.sum() / n_valid) * 7)
            severity = apply_severity_cutoff(total, "GAD-7")

        scores.append(total)
        severities.append(severity)
        n_missing_list.append(n_missing)

    result["GAD7_total"] = scores
    result["GAD7_severity"] = severities
    result["GAD7_n_missing"] = n_missing_list
    return result


def score_pcl5(
    df: pd.DataFrame,
    item_cols: Optional[List[str]] = None,
    id_col: str = "participant_id",
) -> pd.DataFrame:
    """
    Score PCL-5 PTSD checklist (20 items, 0–4 each, range 0–80).
    Computes total score and DSM-5 cluster subscores (B/C/D/E).

    Args:
        df:        DataFrame with one row per participant.
        item_cols: List of 20 PCL-5 item column names.
        id_col:    Participant ID column.

    Returns:
        DataFrame with total, cluster subscores, and provisional PTSD flag.
    """
    if item_cols is None:
        item_cols = [f"pcl{i}" for i in range(1, 21)]

    assert len(item_cols) == 20

    result = df[[id_col]].copy() if id_col in df.columns else df.iloc[:, :1].copy()
    result.columns = [id_col]

    clusters = {
        "B_intrusion":       item_cols[0:5],
        "C_avoidance":       item_cols[5:7],
        "D_neg_cognition":   item_cols[7:14],
        "E_hyperarousal":    item_cols[14:20],
    }

    result["PCL5_total"] = df[item_cols].sum(axis=1)
    for cluster_name, cols in clusters.items():
        result[f"PCL5_{cluster_name}"] = df[cols].sum(axis=1)

    result["PCL5_provisional_PTSD"] = result["PCL5_total"] >= 33
    result["PCL5_severity"] = result["PCL5_total"].apply(
        lambda x: apply_severity_cutoff(x, "PCL-5")
    )
    return result

Step 2 — Reliable Change Index

def compute_rci(
    pre_scores: np.ndarray,
    post_scores: np.ndarray,
    scale: str,
    sd_pre: Optional[float] = None,
    r_tt: Optional[float] = None,
    clinical_cutoff: Optional[float] = None,
    dysfunctional_above_cutoff: bool = True,
) -> pd.DataFrame:
    """
    Compute Reliable Change Index (RCI) for pre-post score pairs.

    Classification (Jacobson & Truax, 1991):
        Recovered:    Reliable improvement AND crossed clinical cutoff
        Improved:     Reliable improvement only
        Unchanged:    No reliable change (|RCI| < 1.96)
        Deteriorated: Reliable worsening (RCI <= -1.96)

    Args:
        pre_scores:   Array of pre-treatment scores.
        post_scores:  Array of post-treatment scores.
        scale:        Scale name (e.g., 'PHQ-9') for default SD and r_tt lookup.
        sd_pre:       SD of pre-treatment scores (overrides norm SD if provided).
        r_tt:         Test-retest reliability (overrides default if provided).
        clinical_cutoff: Score threshold for functional vs dysfunctional range.
        dysfunctional_above_cutoff: True if high scores = clinical (PHQ-9, GAD-7);
                                    False if low scores = clinical (SWLS).

    Returns:
        DataFrame with RCI, classification, and pre/post severity.
    """
    sd = sd_pre if sd_pre is not None else SCALE_NORM_SD.get(scale, 10.0)
    rtt = r_tt if r_tt is not None else SCALE_RELIABILITY.get(scale, 0.85)
    cutoff = clinical_cutoff if clinical_cutoff is not None else CLINICAL_CUTOFF.get(scale, None)

    se_diff = sd * np.sqrt(2) * np.sqrt(1 - rtt)

    rci_values = (post_scores - pre_scores) / se_diff

    classifications = []
    for rci_val, pre, post in zip(rci_values, pre_scores, post_scores):
        if rci_val <= -1.96:  # reliable improvement (lower score = better)
            if cutoff is not None:
                if dysfunctional_above_cutoff:
                    crossed = pre >= cutoff and post < cutoff
                else:
                    crossed = pre < cutoff and post >= cutoff
                cls = "Recovered" if crossed else "Improved"
            else:
                cls = "Improved"
        elif rci_val >= 1.96:  # reliable worsening
            cls = "Deteriorated"
        else:
            cls = "Unchanged"
        classifications.append(cls)

    result_df = pd.DataFrame({
        "pre_score": pre_scores,
        "post_score": post_scores,
        "change": post_scores - pre_scores,
        "RCI": np.round(rci_values, 3),
        "classification": classifications,
        "pre_severity": [apply_severity_cutoff(s, scale) for s in pre_scores],
        "post_severity": [apply_severity_cutoff(s, scale) for s in post_scores],
    })

    # Summary statistics
    class_counts = result_df["classification"].value_counts()
    n = len(result_df)
    print(f"\nRCI Summary for {scale} (SE_diff = {se_diff:.2f}, r_tt = {rtt}):")
    for cls in ["Recovered", "Improved", "Unchanged", "Deteriorated"]:
        count = class_counts.get(cls, 0)
        print(f"  {cls:15s}: n = {count:3d} ({count/n:.1%})")

    return result_df

Step 3 — Longitudinal Visualization

def plot_longitudinal_trajectories(
    df_long: pd.DataFrame,
    outcome_col: str,
    time_col: str,
    id_col: str,
    scale_name: str = "",
    highlight_clinical_cutoff: Optional[float] = None,
    output_path: Optional[str] = None,
    n_highlight: int = 5,
) -> plt.Figure:
    """
    Plot individual longitudinal trajectories with group mean overlay.

    Args:
        df_long:                  Long-format DataFrame (one row per person × time).
        outcome_col:              Score column.
        time_col:                 Time point column (numeric or ordered categorical).
        id_col:                   Participant ID column.
        scale_name:               Scale label for plot title and y-axis.
        highlight_clinical_cutoff: Draw a horizontal reference line at this score.
        output_path:              Optional path to save figure.
        n_highlight:              Number of individual trajectories to highlight.

    Returns:
        Matplotlib Figure.
    """
    time_points = sorted(df_long[time_col].unique())
    persons = df_long[id_col].unique()
    rng = np.random.default_rng(42)

    fig, ax = plt.subplots(figsize=(10, 6))

    # All individual trajectories (light gray)
    for person in persons:
        pdata = df_long[df_long[id_col] == person].sort_values(time_col)
        ax.plot(pdata[time_col], pdata[outcome_col],
                color="lightgray", linewidth=0.8, alpha=0.5, zorder=1)

    # Highlighted individuals
    highlighted = rng.choice(persons, min(n_highlight, len(persons)), replace=False)
    colors = plt.cm.tab10(np.linspace(0, 0.8, len(highlighted)))
    for person, color in zip(highlighted, colors):
        pdata = df_long[df_long[id_col] == person].sort_values(time_col)
        ax.plot(pdata[time_col], pdata[outcome_col],
                color=color, linewidth=2.0, alpha=0.9, zorder=3,
                label=f"{id_col}={person}")

    # Group mean trajectory
    group_mean = df_long.groupby(time_col)[outcome_col].agg(["mean", "sem"]).reset_index()
    ax.plot(group_mean[time_col], group_mean["mean"],
            color="black", linewidth=3, zorder=4, label="Group mean")
    ax.fill_between(
        group_mean[time_col],
        group_mean["mean"] - 1.96 * group_mean["sem"],
        group_mean["mean"] + 1.96 * group_mean["sem"],
        alpha=0.15, color="black", zorder=2,
    )

    # Clinical cutoff line
    if highlight_clinical_cutoff is not None:
        ax.axhline(highlight_clinical_cutoff, color="crimson", linestyle="--",
                   linewidth=1.5, label=f"Clinical cutoff ({highlight_clinical_cutoff})")

    ax.set_xlabel("Time Point")
    ax.set_ylabel(scale_name or outcome_col)
    ax.set_title(f"Longitudinal Trajectories — {scale_name}")
    ax.legend(fontsize=8, loc="upper right")
    ax.grid(alpha=0.3)
    fig.tight_layout()

    if output_path:
        fig.savefig(output_path, dpi=150)
    plt.show()
    return fig

Advanced Usage

Batch Scoring with PHQ-9 + GAD-7 Combined

def batch_score_all_scales(
    df: pd.DataFrame,
    id_col: str = "participant_id",
    phq9_items: Optional[List[str]] = None,
    gad7_items: Optional[List[str]] = None,
    pcl5_items: Optional[List[str]] = None,
) -> pd.DataFrame:
    """
    Score PHQ-9, GAD-7, and PCL-5 in one call and merge results.

    Args:
        df:           Wide-format DataFrame.
        id_col:       Participant ID column.
        phq9_items:   PHQ-9 item columns (defaults to phq1–phq9).
        gad7_items:   GAD-7 item columns (defaults to gad1–gad7).
        pcl5_items:   PCL-5 item columns (defaults to pcl1–pcl20).

    Returns:
        DataFrame with all scale scores merged on id_col.
    """
    phq9_df = score_phq9(df, item_cols=phq9_items, id_col=id_col)
    gad7_df = score_gad7(df, item_cols=gad7_items, id_col=id_col)
    pcl5_df = score_pcl5(df, item_cols=pcl5_items, id_col=id_col)

    merged = phq9_df.merge(gad7_df, on=id_col).merge(pcl5_df, on=id_col)

    # Comorbidity flag
    merged["comorbid_dep_anx"] = (
        (merged["PHQ9_total"] >= 10) & (merged["GAD7_total"] >= 10)
    )

    print(f"\nBatch scoring complete: {len(merged)} participants")
    print(f"PHQ-9 ≥ 10 (moderate+): {(merged['PHQ9_total'] >= 10).sum()}")
    print(f"GAD-7 ≥ 10 (moderate+): {(merged['GAD7_total'] >= 10).sum()}")
    print(f"PCL-5 ≥ 33 (provisional PTSD): {merged['PCL5_provisional_PTSD'].sum()}")
    print(f"Comorbid depression+anxiety: {merged['comorbid_dep_anx'].sum()}")
    return merged

Troubleshooting

Problem	Likely Cause	Solution
Negative RCI for "worsening" with PHQ-9	Convention: lower = better	Check `rci_val >= 1.96` means deterioration
All classified as "Unchanged"	SD too large or wrong scale	Use clinical sample SD, not general population
Prorated score unexpectedly high	All non-missing items are high	Expected behavior; flag if > 2 items missing
PCL-5 cluster subscores don't sum to total	Rounding or missing items	Ensure no NaN; use `sum(axis=1, min_count=20)`
Longitudinal plot illegible	Too many participants	Reduce `n_highlight` or use mean + CI only
NaN in severity column	Score is NaN (too many missing)	Apply `dropna()` before severity lookup

External Resources

Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). The PHQ-9. Journal of General Internal Medicine, 16(9), 606–613.
Spitzer, R. L., et al. (2006). A brief measure for assessing GAD. JAMA Internal Medicine.
Blevins, C. A., et al. (2015). PCL-5: Initial psychometric assessment. Assessment, 22(5), 477–482.
Jacobson, N. S., & Truax, P. (1991). Clinical significance. Journal of Consulting and Clinical Psychology, 59(1), 12–19.
Diener, E., et al. (1985). The Satisfaction with Life Scale. Journal of Personality Assessment, 49(1), 71–75.

Examples

Example 1 — Batch PHQ-9 + GAD-7 Scoring with Cutoff Classification

import pandas as pd
import numpy as np

# Simulate participant data
rng = np.random.default_rng(0)
n = 80
data = {
    "participant_id": [f"P{i:03d}" for i in range(1, n + 1)],
}
# PHQ-9 items (0–3)
for i in range(1, 10):
    data[f"phq{i}"] = rng.integers(0, 4, n)
# GAD-7 items (0–3)
for i in range(1, 8):
    data[f"gad{i}"] = rng.integers(0, 4, n)
# PCL-5 items (0–4)
for i in range(1, 21):
    data[f"pcl{i}"] = rng.integers(0, 5, n)

# Introduce some missing values
data["phq3"][5] = np.nan
data["gad2"][12] = np.nan

df_raw = pd.DataFrame(data)

# Score all scales
df_scored = batch_score_all_scales(
    df_raw,
    id_col="participant_id",
    phq9_items=[f"phq{i}" for i in range(1, 10)],
    gad7_items=[f"gad{i}" for i in range(1, 8)],
    pcl5_items=[f"pcl{i}" for i in range(1, 21)],
)

print("\nSample of scored data:")
print(df_scored[["participant_id", "PHQ9_total", "PHQ9_severity",
                  "GAD7_total", "GAD7_severity", "PCL5_total"]].head(10))

# Severity distribution
fig, axes = plt.subplots(1, 3, figsize=(14, 5))
for ax, (col, title) in zip(axes, [
    ("PHQ9_severity", "PHQ-9 Severity"),
    ("GAD7_severity", "GAD-7 Severity"),
    ("PCL5_severity", "PCL-5 Severity"),
]):
    counts = df_scored[col].value_counts()
    ax.bar(counts.index, counts.values, color="steelblue", edgecolor="white")
    ax.set_title(title)
    ax.set_ylabel("Frequency")
    ax.tick_params(axis="x", rotation=30)
fig.tight_layout()
plt.savefig("severity_distributions.png", dpi=150)
plt.show()

Example 2 — RCI and Clinical Significance Classification

# Simulate pre-post treatment data
rng = np.random.default_rng(1)
n_pts = 60
pre = rng.normal(16, 6, n_pts).clip(0, 27).round()  # moderate depression
post = (pre - rng.normal(5, 4, n_pts)).clip(0, 27).round()  # treatment effect

rci_df = compute_rci(
    pre_scores=pre,
    post_scores=post,
    scale="PHQ-9",
    clinical_cutoff=10,
    dysfunctional_above_cutoff=True,
)

print("\nRCI results sample:")
print(rci_df.head(10).to_string())

# Visualize classification
class_counts = rci_df["classification"].value_counts()
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

colors_map = {"Recovered": "#2ecc71", "Improved": "#3498db",
              "Unchanged": "#f39c12", "Deteriorated": "#e74c3c"}
bars = axes[0].bar(class_counts.index,
                   class_counts.values,
                   color=[colors_map.get(c, "gray") for c in class_counts.index])
axes[0].set_title("Treatment Response Classification")
axes[0].set_ylabel("Number of Participants")
for bar, count in zip(bars, class_counts.values):
    axes[0].text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.5,
                 str(count), ha="center", va="bottom", fontsize=10)

# Scatter pre vs post with RCI classification
scatter_colors = [colors_map.get(c, "gray") for c in rci_df["classification"]]
axes[1].scatter(rci_df["pre_score"], rci_df["post_score"],
                c=scatter_colors, alpha=0.7, s=40)
axes[1].plot([0, 27], [0, 27], "k--", linewidth=1, label="No change")
axes[1].axhline(10, color="crimson", linestyle=":", linewidth=1, label="Cutoff post")
axes[1].axvline(10, color="crimson", linestyle=":", linewidth=1, label="Cutoff pre")
axes[1].set_xlabel("Pre-treatment PHQ-9")
axes[1].set_ylabel("Post-treatment PHQ-9")
axes[1].set_title("RCI Scatter Plot")
legend_patches = [mpatches.Patch(color=c, label=l) for l, c in colors_map.items()]
axes[1].legend(handles=legend_patches, fontsize=8)

import matplotlib.patches as mpatches
fig.tight_layout()
plt.savefig("rci_analysis.png", dpi=150)
plt.show()
print("Clinical assessment analysis complete.")

Changelog

Version	Date	Change
1.0.0	2026-03-18	Initial release — PHQ-9/GAD-7/PCL-5 scoring, RCI, clinical significance, longitudinal plots