name: run-ai-checks
description: "Use when the user asks to run or refresh true AI-reviewed checks on live Directory data. This skill performs a strongest-model review on a fresh Directory snapshot, keeps regex/heuristic checks out of ai-check-cache/, validates cache reuse against a pristine pre-plugin checksum baseline, refreshes changed entities, and requires clean end-to-end validation before commit."
Run AI Checks
Use this skill when the user asks to run full AI checks, refresh ai-check-cache/, validate stale AI-cache warnings, or prepare AI-cache updates for review.
Required workflow
- Start from a fresh live Directory snapshot:
- use
Directory(schema=..., purgeCaches=['directory'])for the AI-review run - keep withdrawn scope explicit; active-only is the default unless the user explicitly asks otherwise
- reminder: use the strongest available model in the current session when possible
- use
- Review current deterministic coverage first:
- inspect
checks/, especiallytext_consistency.pyand the existing deterministic plugins - review the full existing check surface before proposing anything new, so existing deterministic and AI-backed checks are not forgotten
- document any newly introduced AI-only check type in this skill so future runs know that the category exists and how it should be reviewed
- do not put regex-like findings into
ai-check-cache/
- inspect
- Compute checksums for all reviewed entities before trusting the existing AI cache:
- compute canonical checksums on the live entities first, before deciding what can be reused
- capture checksums from a pristine pre-plugin Directory snapshot, not after QC plugins mutate in-memory entities during a run
- checksums must be based only on relevant content fields
- exclude volatile/runtime-only fields such as timestamps and
mg_*
- Compare the live checksum map with the existing AI cache:
- if an entity is already in the AI cache and the checksum is unchanged, keep its existing AI findings intact
- if an entity is already in the AI cache and the checksum changed, rerun all AI-only checks for that entity and replace its old findings
- if an entity is not yet in the AI cache, run all AI-only checks for that entity
- if new AI rules are introduced, run only those additional rules on unchanged entities
- Preserve explicit “no finding” coverage:
- keep checksum metadata for all reviewed entities, including entities that currently produce no AI findings
- unchanged entities with no findings can be skipped on later runs
- changed entities with no findings must be fully re-reviewed, then kept in the checked-entity list even if they still produce no findings
- Run the full AI review on the live data in Codex:
- identify only findings that cannot be expressed robustly as deterministic checks
- check current
ai-check-cache/entries for overlap before adding anything new - review consistency between unstructured descriptions and structured metadata for both biobanks and collections
- explicitly cover at least these topic groups:
- phenotypic and clinical profile of donors/research participants:
- age groups and age-range implications
- clinically actionable biological sex
- diagnosis and clinically relevant disease framing
- stated inclusion criteria and special population statements
- collected data:
- whether narrative claims about clinical, imaging, omics, registry, follow-up, questionnaire, or similar data are reflected in structured metadata
- collected biological material:
- whether narrative claims about FFPE, tissue, blood derivatives, swabs, DNA/RNA, or similar materials are reflected in structured metadata
- access conditions:
- whether narrative access/governance/restriction statements are reflected in structured access metadata
- consider whether the evidence would support proposing richer structured access profiling, including DUC/CCE-style profiles, when that would materially improve reuse
- phenotypic and clinical profile of donors/research participants:
- prefer findings that affect meaningful advertisement, discoverability, or reuse of collections/biobanks
- Update
ai-check-cache/entries only for the residual AI-only findings:- keep JSON stable and commit-friendly
- include current
entity_checksumandsource_checksum - ensure
checked_fieldsstill match the actual live Directory field names used by the rule; if the field model changed, rebaselinechecked_entitiesfor the full reviewed scope - keep
checked_entitiescomplete enough to represent both positive findings and reviewed-no-finding entities - include enough message/action detail that non-experts can understand the issue
- Validate the refreshed AI cache on a clean
Directory(...)load first:- confirm that
load_ai_findings_for_directory(...)reports reusable findings and no stale-cache issues for the fresh live snapshot - if the clean load shows stale-cache issues, refresh or fix the cache before doing anything else
- confirm that
- Re-run the normal QC path after updating the cache:
- run
python3 data-check.py -N -r - do not purge the Directory cache for this validation run
- use this run to confirm there are no stale AI-cache warnings in the full QC path
- inspect the actual
AI:Curatedwarnings separately, either by runningpython3 data-check.py -r | rg 'AI:Curated'without-Nor by callingAIFindings().check(...)directly in a short Python snippet - if this full QC run emits stale AI-cache warnings, treat that as a checksum/runtime bug and fix it before committing
- run
Current AI-only categories
Keep this list current when adding a new AI-only rule:
NarrativeAccessMetadataGap- concrete access/governance conditions are present in the narrative but missing from structured access metadata
NarrativeParticipantClinicalProfileGap- narrative describes participant phenotype or clinical profile that is materially relevant for discovery/reuse, but the structured sex/age/diagnosis profile does not reflect it
NarrativeDataCategoryGap- narrative advertises clinically relevant data modalities or follow-up/registry/questionnaire content that is not reflected in
data_categories
- narrative advertises clinically relevant data modalities or follow-up/registry/questionnaire content that is not reflected in
NarrativeMaterialMetadataGap- narrative advertises concrete stored biomaterials that are not reflected in
materials
- narrative advertises concrete stored biomaterials that are not reflected in
Required checks
- Do not trust stale AI-cache findings. If
AIFindingsreports changed entity IDs, refresh the live AI review before using or editing those entries. - Keep the AI cache focused on genuinely AI-only findings. If a pattern is deterministic, move it into a regular plugin instead.
- Before keeping any AI-reviewed finding, check for redundancy against all existing programmatic checks in
checks/; prefer the non-AI check whenever the rule can be expressed and tested deterministically. - Do not commit private runtime caches or ad-hoc local outputs outside
ai-check-cache/. - Keep ad-hoc review dumps such as
ai-checks-results-current.txtout of commits; they are local review artifacts, not framework inputs. - Treat clean-load cache validation as the source of truth for reuse, but require the full QC path to stay clean too; the runtime must preserve the pristine checksum baseline.
Validation expectations
After updating the AI cache, validate at least:
python3 -m py_compile ai_cache.py checks/AIFindings.py <changed deterministic helpers/plugins/tests>pytest -q tests/test_ai_cache.py tests/test_ai_findings_check.py <changed deterministic test subsets>- a clean-snapshot cache-load check using
Directory(..., purgeCaches=['directory'])plusload_ai_findings_for_directory(...) python3 data-check.py -N -rwithout purging the Directory cachepython3 ../BBMRI-ERIC-Directory-Data-Manager-Manual/scripts/generate_checks_docs.pyif documentation metadata changed
The update is not ready to commit unless both validation paths are clean:
- clean live cache load: reusable findings, zero stale-cache issues
- full
data-check.py -N -rpath: no stale AI-cache warnings
Output format
Report in this order:
- Live-data scope reviewed
- AI-only findings added/updated/removed
- Deterministic findings that should become or remain regular plugin checks instead
- Cache reuse summary:
- unchanged entities reused
- changed entities re-reviewed
- new entities reviewed
- reviewed entities with no findings retained in checksum metadata
- Validation status
- Whether the cache is ready to commit