name: evidence-chain-assessment description: Map a health AI tool through the 5-phase evidence chain from lab to deployment category: clinical-ai raigh_tier: tier-1 difficulty: intermediate estimated_time: "4 hours" prerequisites: [run-tripod-ai-checklist] tags: [evidence, validation, deployment, clinical-ai, lifecycle] evidence_basis: "https://doi.org/10.1038/s41591-020-01069-z" version: "1.0"
Evidence Chain Assessment
Purpose
Getting an AI model from "works in the lab" to "safe for patients" requires a chain of evidence across 5 distinct phases. Most health AI tools stall at Phase 1 (technical accuracy) and never complete Phases 3-5 (clinical validation, deployment, monitoring). This skill teaches you to map any health AI tool's evidence maturity and identify what's missing.
Learning Objectives
- Define the 5 phases of health AI evidence maturity
- Assess where a specific AI tool sits in the evidence chain
- Identify critical evidence gaps that block deployment
- Design the study needed to fill the most critical gap
- Communicate evidence maturity to non-technical stakeholders
Context
When a vendor says "our AI is 95% accurate," you need to ask: accurate in whose hands, on what population, compared to what, and does accuracy translate to better patient outcomes? The evidence chain framework gives you the vocabulary and structure to ask the right questions.
Steps
Step 1: Learn the 5 Phases
| Phase | Name | Key Question | Gold Standard |
|---|---|---|---|
| 1 | Technical Validation | Does the model perform well on held-out data? | AUROC, AUPRC on external test set |
| 2 | Clinical Validation | Does performance hold in clinical conditions? | Prospective study in clinical setting |
| 3 | Clinical Utility | Does using the model improve clinical decisions? | Randomized controlled trial or DCA |
| 4 | Implementation | Can the model be deployed safely in real workflows? | Deployment study with workflow integration |
| 5 | Monitoring | Does performance hold over time in production? | Continuous monitoring with drift detection |
Step 2: Select an AI Tool to Assess
Choose a health AI tool from:
- The
awesome-health-ai-evaluationmodel registry - An AI tool used or proposed at your institution
- A commercially available FDA-cleared AI device
Step 3: Evidence Mapping
For each phase, search for and document the evidence:
Phase 1: Technical Validation
├── Published papers: [list with DOIs]
├── Test set description: [internal? external? multi-site?]
├── Key metrics: [AUROC, sensitivity, specificity, calibration]
├── Comparison to baseline: [what was the comparator?]
└── Assessment: [Strong / Adequate / Weak / Missing]
Phase 2: Clinical Validation
├── Prospective studies: [list with DOIs]
├── Population: [who was studied? representative?]
├── Setting: [academic? community? LMIC?]
├── Sample size: [adequate for the prevalence?]
└── Assessment: [Strong / Adequate / Weak / Missing]
Phase 3: Clinical Utility
├── RCTs or comparative studies: [list with DOIs]
├── Outcome measures: [patient outcomes? process measures?]
├── Decision Curve Analysis: [done? results?]
├── Override/adoption rates: [do clinicians actually use it?]
└── Assessment: [Strong / Adequate / Weak / Missing]
Phase 4: Implementation
├── Deployment studies: [list with DOIs]
├── Workflow integration: [how was it embedded?]
├── User experience: [clinician feedback?]
├── Failure modes: [what went wrong?]
└── Assessment: [Strong / Adequate / Weak / Missing]
Phase 5: Monitoring
├── Post-deployment monitoring: [is it being tracked?]
├── Performance drift: [has performance changed over time?]
├── Demographic fairness: [different performance by subgroup?]
├── Feedback loop: [are corrections fed back to model?]
└── Assessment: [Strong / Adequate / Weak / Missing]
Step 4: Evidence Maturity Score
| Score | Meaning |
|---|---|
| Phase 1 only | Lab-ready. Not clinically validated. |
| Phase 1-2 | Clinically validated. Not proven useful. |
| Phase 1-3 | Clinical utility demonstrated. Ready for deployment planning. |
| Phase 1-4 | Deployed. Needs monitoring plan. |
| Phase 1-5 | Full evidence chain. Gold standard. |
Most health AI tools score Phase 1 only. The IDx-DR (diabetic retinopathy screening) is one of the few Phase 1-5 examples.
Step 5: Gap Analysis and Next Study Design
For the most critical missing phase, design the study that would fill the gap:
- Study type (RCT, prospective cohort, implementation study)
- Population and setting
- Primary outcome
- Sample size estimate
- Timeline and budget estimate
Artifacts
- Evidence Chain Map — Complete 5-phase assessment for one health AI tool (using the template above)
- Evidence Maturity Score — Phase score with justification
- Gap Analysis — Which phase is the weakest link and why
- Next Study Proposal — 1-page study design to fill the most critical gap
Assessment Criteria
| Criterion | Meets Standard | Below Standard |
|---|---|---|
| All 5 phases assessed | Every phase documented with evidence or "Missing" | Phases skipped |
| Evidence accurately represented | Cited papers match claims | Misrepresentation of study findings |
| Gap identification | Critical gap identified with clinical reasoning | Superficial gap analysis |
| Study proposal | Feasible study design addressing the right gap | Unrealistic or misdirected proposal |
Common Mistakes
- Treating Phase 1 (technical) evidence as sufficient for deployment
- Ignoring Phase 5 (monitoring) — performance degrades over time, especially in new populations
- Conflating association studies with clinical utility (showing a biomarker correlates with outcome ≠ showing the AI improves decisions)
- Assuming evidence from one population transfers to another (a model validated in US academic centers may fail in African community hospitals)
Related Skills
run-tripod-ai-checklist— Phase 1-2 reporting quality assessmentdecision-curve-analysis— Phase 3 clinical utility quantificationmodel-card-generator— Document the evidence chain in a model cardbridge-tbi-protocol— Example of Phase 1-3 evidence chain in TBI
References
- Park & Han (2018). Methodological Guide for Health AI Validation: https://doi.org/10.3348/kjr.2018.19.3.552
- Kelly et al. (2019). Key challenges for delivering clinical impact with AI: https://doi.org/10.1186/s12916-019-1426-2
- WHO Guidance on AI for Health (2021): https://www.who.int/publications/i/item/9789240029200