run-tripod-ai-checklist - SKILL.md Agent Skill

name: run-tripod-ai-checklist description: Apply the TRIPOD+AI checklist to evaluate a published health AI study category: ai-evaluation raigh_tier: tier-1 difficulty: intermediate estimated_time: "3 hours" prerequisites: [fhir-resource-basics] tags: [tripod, reporting-standards, critical-appraisal, evidence, methodology] evidence_basis: "https://doi.org/10.1136/bmj-2023-078378" version: "1.0"

Run TRIPOD+AI Checklist

Purpose

TRIPOD+AI is the gold standard checklist for reporting prediction model studies that use AI/ML. Most published health AI studies fail basic reporting standards. This skill teaches you to systematically evaluate any health AI study using the full TRIPOD+AI checklist — the same tool peer reviewers and regulators use.

Learning Objectives

Explain why reporting standards matter for health AI evidence
Navigate the TRIPOD+AI checklist (27 items across 7 sections)
Apply the checklist to a published health AI study
Identify critical gaps that undermine a study's validity
Write a structured evidence assessment report

Context

When someone publishes a paper claiming "our AI achieves 95% accuracy for detecting disease X," you need to know: What accuracy metric? On what population? With what validation method? Was it compared to clinician performance? TRIPOD+AI gives you the systematic framework to evaluate these claims.

Steps

Step 1: Obtain the Checklist

Download the TRIPOD+AI checklist from: https://www.tripod-statement.org/

The checklist has 27 items across these sections:

Title and Abstract
Introduction
Methods (Participants, Predictors/Model, Outcome, Sample Size, Missing Data, Analysis)
Results
Discussion
Other Information (Funding, Data/Code Sharing)

Step 2: Select a Study to Evaluate

Choose a published health AI study. Good candidates:

Any study from the awesome-health-ai-evaluation model registry
A recent paper from your clinical domain
The original IDx-DR FDA clearance study (good example of strong reporting)

Step 3: Systematic Evaluation

For each of the 27 TRIPOD+AI items, score the study:

Score	Meaning
Complete	Item fully reported with sufficient detail
Partial	Item mentioned but lacks detail
Not reported	Item not addressed
N/A	Item not applicable to this study type

Work through each section methodically. Key items that most studies fail:

Item 4a — Data sources (where did training data come from?)
Item 6a — Model architecture and hyperparameters (reproducibility)
Item 8 — Sample size justification (was the study powered?)
Item 9 — Missing data handling (critical for real-world deployment)
Item 13a — Calibration assessment (not just discrimination)
Item 15 — Model availability (can others validate?)

Step 4: Calculate Reporting Completeness

Count:

Total applicable items
Complete items
Partial items
Not reported items

Reporting Completeness = (Complete + 0.5 * Partial) / Total Applicable × 100%

Research shows the median completeness for health AI studies is ~50%. Studies below 60% should be interpreted with significant caution.

Step 5: Write Assessment Report

Structure your report as:

Study summary (1 paragraph: what was studied, main finding, study design)
TRIPOD+AI completeness (overall score + breakdown by section)
Critical gaps (which missing items most undermine confidence?)
Strengths (what did the study do well?)
Recommendation (is this evidence strong enough to inform clinical practice?)

Artifacts

Completed TRIPOD+AI Checklist — All 27 items scored for one published study
Evidence Assessment Report — 2-3 page structured report following the template above
Gap Summary Table — Table of all "Not reported" and "Partial" items with impact assessment

Assessment Criteria

Criterion	Meets Standard	Below Standard
Checklist completeness	All 27 items evaluated	Items skipped or grouped
Scoring accuracy	Scores match actual paper content	Scores don't reflect paper
Critical analysis	Identifies which gaps matter most and why	Lists gaps without impact analysis
Report quality	Clear, structured, actionable conclusion	Vague or unfocused

Common Mistakes

Confusing TRIPOD+AI with TRIPOD (classic) — the +AI extension adds items specific to ML/AI
Scoring "Complete" when the study mentions something briefly (that's "Partial")
Focusing only on performance metrics and ignoring methods items (8, 9, 13a)
Treating the checklist as a scorecard rather than a guide — some items matter more than others depending on context

Related Skills

evaluate-model-calibration — Deep dive into TRIPOD+AI Item 13a
decision-curve-analysis — TRIPOD+AI Item 16 (clinical utility)
evidence-chain-assessment — Place the study in the broader evidence chain
model-card-generator — Create structured model documentation from TRIPOD+AI assessment

References

TRIPOD+AI Statement: https://doi.org/10.1136/bmj-2023-078378
TRIPOD+AI Explanation & Elaboration: https://doi.org/10.1136/bmj-2023-078259
Collins GS et al. Protocol for TRIPOD+AI: https://doi.org/10.1136/bmjopen-2020-038916