name: dismech-pr-review description: > Instructions for reviewing a dismech PR, in particular PRs relating to disorder curation, either creating new dismech entries or updating existing ones.
PR Review Skill
Use this skill to review or draft curation guidance and to QA disorder entries for correctness, specificity, and schema alignment.
Use all appropriate skills. What follows are some specific guidelines aimed to catch common suboptimal things we see in PRs a lot. This list is not complete and you should always consult skills and comparable entries.
IMPORTANT: check for silent reversions when the PR owner has resolved conflicts
Although your primary objective is to evaluate the biological and clinical content of PRs, you MUST be vigilant for cases where the committed has botched a rebase or merge arising from merge conflict resolution (usually in cache files). You should also try ensure that all changes in the PR are in scope. If files are touched that are not relevant to the original request this is a warning sign.
If in doubt, mark the PR as being review-required, and assign to cmungall.
If the author truly did intend to include changes that seem out of scope, they will label the PR as scope-override.
If you see a massive number of files touched and these are not relevant, this is a sure sign something has gone horribly wrong. Flag the PR assign to cmungall and stop.
Typically conflicts arise from difference in cache files. We don't really care so much how these are resolved as they are derived files. More care should be taken when looking at conflicts resolution in anything authored, whether it is yaml, python, markdown etc.
If the case seems nuanced, consult issue #1430 for further guidance.
Trust the Validation Process
Do NOT second-guess deterministic validation.
The dismech CI pipeline runs just validate, just validate-terms-file, and just validate-references on every PR. If a file passes those checks, it is schema-valid and structurally correct. The reviewer's job is NOT to re-inspect the output of these checks — that is redundant and leads to false positives.
Concretely:
- Do not flag empty YAML keys (e.g.,
datasets:with no content). In LinkML, a null value, an empty list, and a missing key are semantically equivalent. If validation passes, the entry is valid. - Do not flag YAML structure or whitespace issues — these are linting concerns outside the review scope.
- Do not flag schema fields (required/optional presence, field types, enum values) — the schema validator is authoritative.
- Do not flag HGNC CURIE case if you have not confirmed it actually fails validation. Only flag it if you can verify the mismatch exists and causes validation failure.
CRITICAL: Do NOT use cache/enums/*.csv as a term validation proxy
The cache/enums/*.csv files are static CI snapshot artifacts. They materialize schema enum constraints at a point in time but do NOT reflect the full, current HPO/GO/CL/MAXO/etc. ontology. These CSV files are regularly stale relative to the OAK/sqlite:obo:* adapters used by just validate-terms.
If a term appears absent from a cache/enums/*.csv file, that is NOT evidence it will fail just validate-terms. The actual term validator queries the live ontology database — it is authoritative. The CSV is not.
Prohibited behavior:
- Inspecting
cache/enums/phenotypeterm_*.csv(or any othercache/enums/*.csv) to check whether an ontology term is present - Issuing a review finding that an ontology term "will fail validation" or "is absent" based on CSV inspection alone
- Overriding or contradicting a PR author's explicit statement that
just validate-termspassed, based on CSV inspection
If just validate-terms passed (as stated in the PR description or CI logs), that is the final word. Do not second-guess it by inspecting cache artifacts.
The reviewer's role is to evaluate non-deterministic components using biological judgment, domain expertise, and the rubrics below: biological plausibility, ontology specificity, evidence quality, claim–snippet alignment, and section appropriateness. Focus there.
Things NOT to flag
updated_date: Do NOT flag or request updates toupdated_datein reviews. Change tracking is handled via separate git logs and traces.- Empty YAML keys that pass schema validation (e.g.,
datasets:,clinical_trials:). - Structural or formatting issues that would be caught by
just validate— trust CI. - Ontology terms absent from
cache/enums/*.csv— the cache CSV files are stale snapshots and are NOT authoritative for term validation. Ifjust validate-termspassed, the term is valid. Never issue a critical finding based solely on CSV inspection (see "Trust the Validation Process" above).
Deep Research Cross-Check
When a PR adds or updates a disorder YAML and matching deep-research artifacts exist in
research/, treat those artifacts as first-class curation inputs, not optional background.
At minimum:
- Find the matching
research/*-deep-research-*.mdfile and corresponding.citations.md. - Read the research artifact before finalizing review.
- Compare the research artifact against the YAML using the Content-Completeness Checklist below.
- For narrative providers, pay special attention to sections that explicitly call out omitted themes, unmodeled mechanisms, future work, or broader disease context.
- For
astaoutputs, do NOT treat every retrieved paper as a review issue. Prioritize disease-specific, central, quotable, cache-backed items with clear modeling value. - Classify research-backed omissions:
- Blocking: central to the disease, directly supported by quotable abstract or trial text, and straightforward to model now.
- Non-blocking: secondary, speculative, weakly evidenced, not easily snippet-supported, or plausibly outside the intended scope of the YAML.
- Out of scope: useful narrative context that belongs in research notes rather than the structured entry.
- Mention the result of this cross-check in the final review summary when research artifacts were present.
Always check the deep research markdown file. Occasionally agents will cheat and put a "fake" deep research entry. Real entries will always have a frontmatter block with metadata about the run, and after some rote repetition of the original prompt, should have dense narrative results, with citations.
For NEW dismech entries, there MUST be at least one deep research entry in the PR. A dismech entry that lacks this will likely be highly incomplete.
Content-Completeness Checklist
When deep-research artifacts are present, the reviewer MUST walk through each dimension below and note whether the YAML adequately covers what the research surfaced. This checklist exists because schema-compliance review alone systematically misses content gaps (see issue #1673 — the HHT retrospective).
For each dimension, compare the research artifact against the YAML and record one of:
- Adequate: YAML covers the central items surfaced by research.
- Gaps noted: Specific omissions listed; classify each as blocking or non-blocking.
- N/A: Research did not surface meaningful content for this dimension.
Include the completed checklist (or a summary of it) in your review body.
1. Phenotype coverage
- Does the YAML capture the major organ-system manifestations described in research?
- Are organ-specific phenotypes present (e.g., pulmonary, hepatic, cerebral AVMs for vascular diseases)?
- Are frequency data and subtype-specific phenotype assignments included when the research provides them?
- Missing a phenotype that affects >10% of patients and has an HPO term is blocking.
2. Subtype completeness
- Do all subtypes listed in research appear in
has_subtypes? - Does each subtype have a
disease_termwith MONDO or OMIM identifier when the research provides one? - Are subtype-specific phenotype, genetic, and treatment distinctions captured?
- Missing MONDO/OMIM mappings for well-characterized subtypes is blocking when identifiers are available in the research.
3. Pathophysiology depth
- Are the key mechanistic models described in research represented as atomic pathophysiology nodes?
- Are secondary or recently discovered mechanisms captured (e.g., somatic second-hit models, immune involvement at lesion sites)?
- Is histopathological detail (e.g., AVM morphogenesis, perivascular infiltrates) modeled when the research describes it?
- A central disease mechanism described in research with PMID support that is entirely absent from the YAML is blocking.
4. Treatments and clinical trials
- Are all drug treatments named in research present in the YAML
treatmentssection? - Are off-label or emerging therapies with observational data or mechanistic rationale included?
- Are clinical trials (NCT identifiers) surfaced by research captured in
clinical_trials? - Missing a treatment with published trial data (Phase II+) is blocking.
5. Genetic section depth
- Does the genetic section include penetrance data when research provides it?
- Are modifier genes with evidence included?
- Are variant class summaries (missense, splice-site, CNV, etc.) present?
- Is somatic mutation evidence (e.g., second-hit models) captured when described in research?
- A genetic section that lists only gene names when the research provides penetrance, modifiers, and variant data is blocking.
6. Biomarkers and diagnostics
- Are diagnostic biomarkers and imaging findings described in research reflected in the YAML?
- Are diagnostic criteria or screening protocols mentioned in research captured?
7. PMIDs and references
- Are high-value disease-specific PMIDs surfaced by research, especially ones already fetched into
references_cache/, incorporated into the YAML? - Does the YAML appear to under-consume the available references relative to what the research provided?
- Note: deep-research PMIDs can be hallucinated — do not flag missing PMIDs as blocking unless you have verified they are real.
8. Overall consumption assessment
- Does the YAML appear intentionally narrower than the research artifact (acceptable if signaled), or does it look like the research was simply under-consumed (blocking)?
- As a rough heuristic: if the research artifact surfaces N major themes and the YAML covers fewer than half, the entry is likely under-consumed unless the PR description explains the scoping decision.
Common things to suggest fixing
- Debundle Pathophysiology Entries Each pathophysiology entry must be a single atomic event, not a chain or pathway. Example:
- Bad: "Mutations cause X which leads to Y resulting in Z"
- Good: Separate nodes: mutation -> X -> Y -> Z, connected with
downstreamlinks.
- Term Precision Over False Matches Prefer no ontology term over a misleading or too-general term.
- Bad: Generic term that is only "close enough"
- Good: Precise descriptor with
termomitted and a "needs term / NTR" note.
- Ontology Term Granularity
- Too general: flag and request a more specific term.
- Too specific: note mismatch in description (the term is narrower than the claim).
- Missing term: add "needs term / NTR" note; consider filing an NTR.
- GO Terms: Molecular/Cellular Only GO is for molecular and cellular processes, not organ-level physiology.
- Bad: GO terms for systemic/physiological processes
- Good: GO terms for molecular signaling, protein activity, cellular processes Use HP/UBERON for organ-level physiology instead.
- Evidence Must Match Claims Each evidence snippet must directly support what is claimed.
- Phenotype support != frequency support
- Single case reports do not support VERY_FREQUENT
- Model organism evidence must be
evidence_source: MODEL_ORGANISMSnippets must be exact quotes from abstracts or trial summaries. No paraphrase.
Allowed evidence_source values:
HUMAN_CLINICALMODEL_ORGANISMIN_VITROCOMPUTATIONALOTHERSplit mixed sources into separate evidence items.
- Post-Composition / Qualifiers Add qualifiers when needed for precision:
- Location (
located_in) - Direction (
INCREASED,DECREASED,ABERRANT) - Temporal (
temporality: RECURRENT,CHRONIC,ACUTE,SUBACUTE,TRANSIENT, etc.) - Laterality (when applicable)
- Clinical course (
clinical_course: PROGRESSIVE/STABLE) - Descriptor severity (
severity: MILD|MODERATE|SEVERE) - Descriptor onset (
onset.onset_category: CHILDHOOD, etc.)
Prefer the explicit descriptor slots above over the deprecated generic qualifiers
field for common post-composition. Reserve qualifiers for predicate-value cases
that are not covered by dedicated slots.
- Section Appropriateness Put content in the correct section:
- Comorbidities ->
comorbidities, nothistopathology - Diagnostic procedures ->
diagnosis, nottreatments - MAXO diagnostic branch != treatment terms
- Treatment Modeling
- For new entries, NCIT is favored over MAXO
- When MAXO is used, use specific MAXO terms, not generic "pharmacotherapy" if a better term exists (but this term is ok if combined with other terms)
- Explicitly model ion therapies when relevant
- Include therapeutic agents (CHEBI) when known
- Generic MAXO terms are acceptable but less informative. Always check for a more informative NCIT
- Genetic Section Content
Only genetic information belongs in
genetic:
- Good: Gene names (with HGNC terms), inheritance, variants
- Bad: Expression studies, biomarkers, biochemical markers
Put non-genetic data in
biochemicalor appropriate sections. - HGNC CURIEs should use the canonical lowercase prefix
hgnc:(e.g.,hgnc:1100), notHGNC:1100. Only flag if you have verified the mismatch causes a validation failure — do not flag preemptively.
- Subtypes, Stages, and Mappings
- Verify MONDO mappings reflect the same disease concept
- Use
has_subtypesfor true subtypes - Use
stagesfor phased diseases (e.g., cancer phases) - When diseases share a name, confirm which one is intended
Evidence at Cell-Type Granularity When possible, consider evidence at the cell-type level and annotate
cell_typesaccordingly.Research-Backed Completeness (Content-Completeness Checklist) If matching deep-research artifacts exist, walk through the Content-Completeness Checklist in the "Deep Research Cross-Check" section above. This is not optional — it is the primary defence against schema-valid but content-incomplete entries.
- Flag omissions when a central research-backed mechanism, phenotype, diagnostic, treatment, biomarker, or subtype is missing and the evidence is both quotable and in scope for the current YAML.
- If the YAML is narrower than the research artifact, ask whether that narrowing is intentional and sufficiently signaled rather than assuming it is correct.
- Do not flood the review with every uncited paper from a retrieval-heavy artifact; focus on the highest-value misses.
- Include the checklist results (or a summary) in the review body so the curation agent knows exactly what to address.
- Pathograph completeness
- insofar as evidence allows, the pathograph should include both proximal events/perturbations/mutations, and distal events (phenotypes, histopathology)
- there should be join points between treatments and models and the pathograph, where evidence allows
- pathographs should generally link up into a single strongly connected component
- Lumping and splitting
- A dismech entry should correspond to a discrete pathomechanism.
- Do not have entries for high level disease groupings or phenotypes (see
kb/modules/for these) - Do not make multiple entries where there is little distinction (e.g. gene specific forms of Bardet Biedl)
- Do not make distinct entries for e.g. severity types
- Align with clingen where possible
- Lumping and splitting can be hard and ambiguous - it is OK to summon a human to help you resolve, and hold off on approving until the human approves
- GeneReviews Baseline Completeness
For new entries and major augmentations, verify that GeneReviews was used as a mandatory baseline where applicable. The goal is not to "box-check" GeneReviews as cited — it is to actively mine GeneReviews as an authoritative clinical source and back specific claims with quoted snippets from each major section.
Step 1 — Is a GeneReviews article tagged?
Check the top-level references: block for an entry with tags: [GeneReviews].
references:
- reference: PMID:XXXXXXXX
tags:
- GeneReviews
If no such tag exists, search PubMed:
curl -sG "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" \
--data-urlencode "db=pubmed" --data-urlencode "retmode=json" \
--data-urlencode "term=<DISEASE NAME>[TI] GeneReviews[TI]"
- If a GeneReviews article exists and is not tagged in a new Mendelian entry,
that is a blocking omission — flag it as
REQUEST_CHANGES. - If no GeneReviews article exists, no action needed.
Step 2 — If GeneReviews is tagged, verify the cache exists
Confirm the GR abstract is cached:
ls references_cache/PMID_<ID>.md
If it is missing, the reviewer should run just fetch-reference PMID:<ID> before
assessing content. A tagged but uncached GR reference means the abstract was never
verified — this is a blocking gap if it is the only source for core claims.
Step 3 — Deep-mine GeneReviews sections
GeneReviews PubMed abstracts cover four clinical domains. Each must be represented by evidence items with exact GR snippets in the YAML, not just by matching narrative content. Check each domain:
- CLINICAL CHARACTERISTICS →
phenotypes:entries with GR snippets backing onset age, clinical course, frequency qualifiers, and major organ systems. A claim like "onset in childhood" or "affects 80–90% of patients" needs a GR evidence item, not just a narrative match. - DIAGNOSIS / TESTING →
diagnosis:entries that use GR snippets to support the diagnostic criteria or testing strategy described in GeneReviews. If GR specifies the confirmatory test (e.g., "molecular genetic testing of [gene] is the primary method of diagnosis"), that sentence should appear as a snippet. - MANAGEMENT →
treatments:entries backed by GR management and surveillance recommendations. Surveillance schedules ("annual ophthalmologic examination"), agents-to-avoid statements, and specific intervention recommendations should each have GR evidence items. - GENETIC COUNSELING →
inheritance:entries and any genetic counseling content should cite GR for transmission risk figures ("Each child of an affected individual has a 50% chance of inheriting the pathogenic variant"), penetrance, recurrence risk, and prenatal/PGT availability.
Section-by-section checklist (complete for every GeneReviews-tagged entry):
[ ] Clinical Characteristics: GR snippets back ≥1 phenotype onset/frequency claim
[ ] Diagnosis: GR snippets back the primary diagnostic test/criteria statement
[ ] Management: GR snippets back ≥1 treatment or surveillance recommendation
[ ] Genetic Counseling: GR snippets back the inheritance transmission-risk statement
[ ] Agents/Circumstances to Avoid: any GR-listed agent is reflected in treatments
[ ] No GR phenotype affecting >10% of patients is absent without scoping rationale
Partial coverage (e.g., Clinical Characteristics mined but Management/Genetic
Counseling sections absent) is a gap — flag which sections are missing and request
them. Full absence of GR evidence items (GR only tagged in references:, never
cited in evidence items) is blocking.
Absence of a GeneReviews-documented phenotype that affects >10% of patients is blocking under the same threshold as the Content-Completeness Checklist.
Review Decision: Formal GitHub Review
After completing the review, you MUST submit a formal GitHub review (not just a comment).
Use gh pr review with one of the three events below.
APPROVE
Submit --approve when all of the following hold:
- No fabricated snippets or wrong PMIDs detected through spot-checking
- No major ontology placement errors (e.g., GO molecular function term in
biological_processes) - All pathophysiology entries are atomic (not chained multi-step sentences)
- When matching deep-research artifacts exist, the Content-Completeness Checklist was completed and no blocking omissions remain across any dimension (phenotypes, subtypes, pathophysiology, treatments, genetics, biomarkers, references)
- GeneReviews baseline check completed (item 15): if a GeneReviews article exists, it is tagged, cached, and actively mined — evidence items with GR snippets exist for ≥1 claim in each of the four major sections present in the GR abstract (Clinical Characteristics, Diagnosis, Management, Genetic Counseling)
- At most minor wording / completeness issues
- (CI handles schema/term/reference validation — do not duplicate that work)
REQUEST_CHANGES
Submit --request-changes when any one of the following is true:
- Fabricated or paraphrased snippet (not an exact quote from the cited abstract)
- Wrong PMID (paper topic does not match the claim being evidenced)
- Significant ontology misuse (e.g., GO MF term in
biological_processes) - Pathophysiology entries bundled into chains rather than single atomic events
- Claim–evidence mismatch (evidence snippet does not support the stated claim)
- A central research-backed mechanism, phenotype, diagnostic, treatment, biomarker, or subtype was omitted even though the supporting evidence is clear, quotable, and in scope for this YAML
- A GeneReviews article exists for a new Mendelian entry but is not tagged (
tags: [GeneReviews]) in the top-levelreferences:block - GeneReviews is tagged but no evidence items in the YAML cite it with exact GR snippets (tagged-but-not-mined)
- GeneReviews sections present in the GR abstract have no corresponding evidence items with GR snippets (partial mining — all present sections must be covered)
- A GeneReviews-documented phenotype affecting >10% of patients is absent from the YAML with no documented scoping rationale
COMMENT + reassign to @cmungall
Submit --comment and reassign the PR/issue to @cmungall when:
- An ambiguous biological claim requires human domain expertise to adjudicate
- It is genuinely unclear whether an issue is blocking or merely cosmetic
- There are conflicting signals between different validation checks
- It is unclear whether a research-backed omission reflects valid scope narrowing or a genuine modeling miss
Strictness policy
Do NOT defer fixable problems with "can be addressed in a future PR". The curating agent can act on feedback immediately. If something is wrong, request changes. Only approve when the entry is genuinely ready to merge. This includes central research-backed omissions when matching deep-research artifacts were available in the PR.