name: cihr-protocol-audit description: Audit CIHR Project Grant clinical trial (RCT) protocols for objective-endpoint-analysis traceability, cross-reference integrity, CIHR section compliance, budget-protocol alignment, and terminology consistency. Use when auditing, reviewing, or QCing a CIHR clinical trial protocol.
CIHR Protocol Audit Skill
Overview
Systematic audit of a CIHR Project Grant clinical trial protocol. Produces a structured checklist (markdown), applies tracked changes for fixable issues, and adds comments for items requiring investigator judgment.
The audit has five parts plus a consistency check pass. Each part can surface issues that require either tracked changes (fixable problems) or comments (judgment calls).
Prerequisites
Before starting the audit:
- Unpack the .docx to access
word/document.xmland related files - Extract plain text for analysis (use
python-docxor direct XML parsing) - If a budget file is provided, extract its text as well
- Read the
references/cihr-rct-headings.mdfile for the mandatory CIHR RCT heading structure
Part A: Objective-Endpoint-Analysis Traceability Matrix
Every objective in the protocol must have three components:
- A clearly stated objective (usually in Section 1.2)
- A defined endpoint or measurement (usually in Section 2.12)
- A specified analysis plan (usually in Section 2.16)
Procedure
Extract all objectives from Section 1.2 (primary, secondary, exploratory)
For each objective:
- Find the corresponding endpoint definition in 2.12 (2.12.1 for primary, 2.12.2 for secondary/exploratory)
- Find the corresponding analysis plan in 2.16 (2.16.1 for primary, 2.16.2 for secondary, 2.16.3 for sensitivity)
- Verify the endpoint measurement method is specified (instrument name, timing, data source)
- Verify the analysis plan specifies: statistical test, model structure, covariates/adjustments
Flag gaps:
- GAP: No analysis plan for this objective
- PARTIAL: Analysis mentioned but lacks specifics (e.g., "descriptive statistics" without specifying which tests, stratification, or comparison methods)
- OK: All three components present and specific
Output Format
| Objective | Statement | Endpoint (Section) | Analysis Plan (Section) | Status | Notes |
|---|
Common Issues
- Exploratory objectives often have endpoints defined but analysis plans that say only "descriptive statistics" without specifying comparison methods
- Provider/patient experience objectives frequently lack between-arm comparison methods for ordinal outcomes (Mann-Whitney U, Wilcoxon)
- Economic/cost-effectiveness objectives may reference a framework section (e.g., 2.15) without a dedicated analysis subsection in 2.16
- Fairness/equity analyses may list metrics (equalized odds, Brier score) without specifying the statistical framework for testing differences
Part B: Cross-Reference Verification
Every time the document says "Section X.Y" or "see Section X", verify the referenced section contains the promised information.
Procedure
Use regex to find all cross-references: patterns like
Section \d+\.\d+,Sec \d+\.\d+,(Section \d+),described in \d+\.\d+For each reference:
- Note the source location (which section makes the reference)
- Note the target section
- Read the target section content
- Verify the specific information claimed to be there actually exists
Flag issues:
- OK: Referenced information is present in the target section
- PARTIAL: Target section exists but the specific referenced information is missing or different from what's claimed
- MISSING: Target section does not exist
Common Issues
- Interim analysis sections (e.g., 2.17) often describe blinded pooled analyses but get referenced for unblinded arm-specific monitoring (e.g., concordance thresholds). These are conceptually different operations that may need separate procedures.
- Sensitivity analysis sections may be referenced before they are defined
- Budget-related cross-references to protocol sections may not align with actual site counts or resource allocations
Output Format
| Reference | Source Location | Target Section | Content Present? | Notes |
|---|
Part C: CIHR RCT Section Compliance
Compare protocol headings against CIHR mandatory RCT section structure. Read references/cihr-rct-headings.md for the required headings.
Procedure
- Extract all section headings from the document (with numbering)
- Map each heading to the corresponding CIHR required heading
- Flag missing required headings
- Flag extra headings that don't map (not an error, just note them)
- Verify subsection numbering is consistent (1.1, 1.2, etc.)
Common Issues
- Section numbering in the document may not match CIHR numbering (e.g., document starts with "1. Condition and Burden" instead of "1.1")
- DSMB section (3.3) may exist but with minimal justification for not having a DSMB
- "Duration of treatment period" (2.8) may say "not applicable" for AI/decision-support trials -- this is acceptable but should be explicitly stated
Part D: Budget-Protocol Alignment
If a budget file is provided, check that every budgeted item aligns with the protocol and every protocol commitment has budget support.
Procedure
- Extract budget line items with amounts and justifications
- For each budget item, verify it maps to a protocol commitment:
- Personnel: role described in protocol sections 3.1-3.2
- Equipment/consumables: needed for procedures described in protocol
- Data linkages: required for endpoints described in protocol
- Verification studies: sample sizes match protocol
- For each protocol commitment, verify budget support:
- Number of sites: do personnel/equipment budgets cover all sites?
- Sample sizes: does the verification substudy budget match the protocol target?
- Data sources: are all required data linkage fees budgeted?
Common Issues
- Coordinator coverage at N sites but protocol specifies M sites (M > N) -- needs justification for which sites have existing support
- Equipment/modules budgeted for fewer sites than the protocol describes
- Verification substudy budgets slightly over the protocol target (acceptable buffer) or under (problem)
- Training stipend amounts that don't round cleanly (e.g., $75K rounded to $90K for "benefits and adjustments")
Output Format
| Category | Amount | Protocol Alignment | Issues |
|---|
Part E: Content Issues
Scan the full document text for garbled text, missing spaces, duplicate fragments, and formatting errors.
Procedure
Garbled text detection: Search for patterns indicating splice errors from tracked-change acceptance:
- Period immediately followed by lowercase letter with no space:
\.\w(excluding decimals and abbreviations) - Orphan fragments: short phrases that don't connect grammatically to surrounding text
- Possessive markers without antecedent:
's advisorywithout a noun before the apostrophe
- Period immediately followed by lowercase letter with no space:
Missing spaces: Search for:
- Two capitalized words joined without space:
[a-z][A-Z]patterns (e.g., "ORsA" should be "ORs. A") - "of" followed directly by a proper noun:
of[A-Z]
- Two capitalized words joined without space:
Duplicate fragments: Search for:
- Near-identical phrases within 100 characters of each other
- Sentences that end with a period and are immediately followed by a rephrased version of the same content
Section reference format: Verify consistent formatting of section references (e.g., "Section 2.7" vs "Sec 2.7" vs "section 2.7")
Part F: Terminology and Naming Consistency
Check that key terms, group names, instrument names, and abbreviations are used consistently throughout.
Procedure
Arm/group names: Extract all references to study arms. Flag inconsistencies:
- "intervention arm" vs "AI arm" vs "EchoNext arm" vs "experimental arm"
- "control arm" vs "usual care" vs "standard care" vs "comparator"
- Pick the canonical name used in Section 2.1 (study design) and flag deviations
Survey/instrument names: Extract all named instruments and check consistency:
- System Usability Scale (SUS) -- always abbreviated the same way after first use?
- CAHPS or CAHPS-adapted -- used consistently?
- Trust-in-automation scale -- same name each time?
- Likert scale references -- consistent anchoring descriptions?
Abbreviation discipline:
- Every abbreviation should be defined at first use
- After definition, use the abbreviation consistently (don't alternate between spelled-out and abbreviated)
- Common abbreviations to check: TTE, SHD, ECG, MACE, ICC, OR, CI, FTE, DSMB, SAP, CRF
Endpoint naming: Verify that endpoint descriptions in 2.12 match the analysis plan descriptions in 2.16 (same names, same definitions)
Site/center naming: If sites are named, verify consistent naming throughout (e.g., "Montreal Heart Institute" vs "MHI" vs "Institut de cardiologie de Montreal")
Statistical test naming: Verify consistent naming of statistical methods across sections (e.g., "mixed-effect logistic regression" vs "mixed-effects logistic regression" vs "multilevel logistic model")
Output Format
| Term Category | Variants Found | Canonical Form | Locations |
|---|
Applying Fixes
After completing the audit:
Tracked Changes (for fixable issues)
Use the body-swap serialization approach for Word XML manipulation:
- Parse document.xml with lxml
- Modify the
<w:body>element (add<w:del>and<w:ins>elements) - Serialize only the body:
etree.tostring(body, encoding='unicode') - Replace the
<w:body>...</w:body>region in the original XML string - This preserves namespace declarations that lxml would otherwise mangle
For text replacements that span multiple <w:r> elements:
- Collect all non-deleted runs, concatenate their text
- Find the match position in the concatenated string
- Map back to affected runs
- Remove affected runs, insert: before-text run +
<w:del>+<w:ins>+ after-text run
Use author "Claude (Audit)" and a fixed date for all changes.
Comments (for judgment calls)
For issues requiring investigator review:
- Add
<w:commentRangeStart>before the target paragraph's first run - Add
<w:commentRangeEnd>and<w:commentReference>after the last run - Add the comment text to
word/comments.xml - Add the author to
word/people.xml(check namespace prefix -- may bew15:notw:) - Escape any
<or>characters in comment text
Output Checklist
Save the audit results as a markdown checklist file alongside the document. Structure:
- Part A table
- Part B table
- Part C table
- Part D table (if budget provided)
- Part E issue list with fix status
- Part F consistency table
Quick Reference: What Gets a Tracked Change vs a Comment
| Issue Type | Action |
|---|---|
| Garbled text / splice error | Tracked change |
| Missing space | Tracked change |
| Duplicate fragment | Tracked change (delete duplicate) |
| Missing analysis detail | Tracked change (add specifics) |
| Cross-reference discrepancy | Comment |
| Budget-protocol mismatch | Comment |
| Missing DSMB justification | Comment |
| Ambiguous methodology | Comment |
| Terminology inconsistency (minor) | Comment noting canonical form |
| Terminology inconsistency (in endpoint/analysis names) | Tracked change to standardize |