name: mic-nutrient-creation description: > Skill for creating new nutrient YAML files from MIC website content. Use this when extracting a nutrient from lpi.oregonstate.edu/mic. Also useful for enhancing existing nutrient entries.
MIC Nutrient Extraction Skill
Overview
Guide the creation of new nutrient YAML files in the MIC knowledge base. The MIC website (lpi.oregonstate.edu/mic) serves as the authoritative research source. This skill emphasizes evidence-based extraction with proper ontology grounding.
When to Use
- User asks to create a new nutrient entry
- User asks to curate a vitamin, mineral, or dietary factor
- User names a nutrient that doesn't exist in
kb/nutrients/
This skill can also be consulted for ongoing curation of existing nutrients.
Workflow
Step 1: Identify Nutrient and Category
Determine the nutrient and its category:
vitamins/- Water-soluble and fat-soluble vitaminsminerals/- Essential mineral elementsdietary-factors/- Other dietary factors (fiber, flavonoids, etc.)food-beverages/- Specific foods or beverages
Check if it already exists:
ls kb/nutrients/**/*.yaml
Step 2: Mark as In Progress
Update CURATION-PROGRESS.md to mark the nutrient as in progress:
- Find the nutrient's line in the file
- Change
[ ]to[~]to indicate work has started - Optionally add a note about what's being worked on
Example: Change this line:
- [ ] Folate (`kb/nutrients/vitamins/folate.yaml`)
To:
- [~] Folate (`kb/nutrients/vitamins/folate.yaml`) - in progress
This helps track curation status and prevents duplicate work.
Step 3: Fetch MIC Page Content
Download the MIC page using the just fetch-mic-page command:
# Format: just fetch-mic-page {category}/{nutrient}
just fetch-mic-page vitamins/biotin
just fetch-mic-page minerals/calcium
just fetch-mic-page dietary-factors/lipoic-acid
This downloads the HTML to cache/mic-pages/{nutrient}.html.
Verify the download:
ls -la cache/mic-pages/biotin.html
Step 4: Extract Reference Mapping
IMPORTANT: Before adding evidence, extract the MIC reference number → PMID mapping:
# Get TSV mapping of ref# to PMID
just extract-refs cache/mic-pages/biotin.html
This outputs a TSV with columns: source, reference_number, pubmed_id, citation
Example output:
source reference_number pubmed_id citation
biotin.html 1 Zempleni J, Wijeratne SSK...
biotin.html 2 PMID:10357733 Mock DM. Biotin...
biotin.html 3 PMID:15992684 Zempleni J, Hassan YI...
Key points:
- Empty
pubmed_id= book chapter or non-PMID source (usemic_referencesonly) - Has
pubmed_id= can fetch abstract and add PMID evidence - The
reference_numbercorresponds tomic_referencesvalues in the YAML
Save the mapping for reference during curation:
just extract-refs cache/mic-pages/biotin.html > cache/refs/biotin-refs.tsv
Step 5: Create Initial YAML File
Create kb/nutrients/{category}/{nutrient}.yaml with the basic structure:
name: Biotin
nutrient_term:
preferred_term: biotin
term:
id: CHEBI:15956
label: biotin
category: vitamin
source_url: https://lpi.oregonstate.edu/mic/vitamins/biotin
alternate_names:
- Vitamin B7
- Vitamin H
description: |
Biotin is a water-soluble B vitamin...
functions: []
deficiency: null
toxicity: null
food_sources: []
drug_interactions: []
nutrient_interactions: []
disease_associations: []
recommendations: null
references: []
Validate the structure:
just validate kb/nutrients/vitamins/biotin.yaml
Step 6: Extract Section by Section
For each MIC page section, read the content and extract structured data:
Functions Section
Extract biological roles, enzymes, and processes:
functions:
- name: Cofactor for Carboxylases
description: |
Biotin serves as a covalently bound cofactor for five mammalian
carboxylases that catalyze carbon dioxide transfer reactions.
biological_processes:
- preferred_term: fatty acid biosynthesis
term:
id: GO:0006633
label: fatty acid biosynthetic process
genes:
- preferred_term: PC
description: Pyruvate carboxylase
term:
id: HGNC:8636
label: PC
evidence:
- reference: PMID:10357733
supports: SUPPORT
snippet: "Biotin serves as a covalently bound coenzyme for five mammalian carboxylases"
explanation: Directly supports biotin's role as carboxylase cofactor
Deficiency Section
Extract symptoms, at-risk groups, and causes:
deficiency:
name: Biotin Deficiency
description: |
Biotin deficiency is rare in healthy individuals...
phenotypes:
- name: Dermatitis
phenotype_term:
preferred_term: dermatitis
term:
id: HP:0000964
label: Eczema
frequency: FREQUENT
evidence:
- reference: PMID:10357733
supports: SUPPORT
snippet: "dermatitis, conjunctivitis, and alopecia"
at_risk_groups:
- name: Pregnant women
description: Marginal biotin deficiency common during pregnancy
Disease Prevention Section
Extract disease associations:
disease_associations:
- name: Biotin and Neural Tube Defects
disease_term:
preferred_term: neural tube defect
term:
id: MONDO:0005343
label: neural tube defect
relationship_type: RISK_FACTOR
direction: DECREASED
population_context: pregnant women with marginal biotin status
evidence:
- reference: PMID:16549401
supports: SUPPORT
snippet: "low biotin status during early pregnancy..."
Food Sources Section
Extract dietary sources:
food_sources:
- name: Egg yolk
food_term:
preferred_term: egg yolk
term:
id: FOODON:00002669
label: egg yolk
amount: "10 mcg"
serving_size: "1 large egg"
Drug Interactions Section
Extract interactions with medications:
drug_interactions:
- name: Anticonvulsants and Biotin
drug_term:
preferred_term: anticonvulsant
interaction_type: REDUCES_ABSORPTION
clinical_significance: MODERATE
evidence:
- reference: PMID:8157857
supports: SUPPORT
snippet: "long-term anticonvulsant therapy..."
Step 7: Ground Entities to Ontologies
Use OAK to find correct ontology terms:
CHEBI (nutrients)
uv run runoak -i sqlite:obo:chebi info "biotin"
uv run runoak -i sqlite:obo:chebi info "l~ascorbic acid"
HP (phenotypes)
uv run runoak -i sqlite:obo:hp info "l~dermatitis"
uv run runoak -i sqlite:obo:hp info HP:0000964
GO (biological processes)
uv run runoak -i sqlite:obo:go info "l~fatty acid biosynthesis"
MONDO (diseases)
uv run runoak -i sqlite:obo:mondo info "l~neural tube defect"
FOODON (foods)
uv run runoak -i sqlite:obo:foodon info "l~egg yolk"
HGNC (genes)
uv run runoak -i sqlite:obo:hgnc info "l~pyruvate carboxylase"
Step 8: Add Evidence
Use the reference mapping from Step 4 to add evidence. For each mic_references number:
- Check if PMID exists in your ref mapping TSV
- If PMID exists: Fetch abstract and add evidence with exact quote
- If no PMID (book/database): Keep
mic_referencesonly, no PMID evidence
evidence:
- reference: PMID:12345678
supports: SUPPORT
snippet: "Exact quote from abstract"
explanation: "Why this supports the claim"
Fetch and verify abstracts:
# Look up PMID for MIC reference number (from Step 3 mapping)
# e.g., ref 55 → PMID:15585762
just fetch-reference PMID:15585762
cat cache/references/pmid_15585762.md
Evidence Rules
- NEVER fabricate PMIDs - Only use PMIDs from the MIC page references (Step 3 mapping)
- NEVER paraphrase snippets - Use exact quotes from abstracts
- Always verify - Fetch and check abstracts before using
- Book references are OK - If MIC ref has no PMID, use
mic_referenceswithout evidence block
Step 9: Validate
Run all validation checks:
# Schema validation
just validate kb/nutrients/vitamins/biotin.yaml
# Term validation (ontology IDs and labels)
just validate-terms-file kb/nutrients/vitamins/biotin.yaml
# Reference validation (snippets match abstracts)
just validate-references kb/nutrients/vitamins/biotin.yaml
# Full QC
just qc
Step 10: Compliance Check and Mark Complete
Check completeness:
just compliance kb/nutrients/vitamins/biotin.yaml
Address high-priority missing fields first:
nutrient_term.term(weight 5.0)disease_associations[].disease_term.term(weight 3.0)deficiency.phenotypes[].phenotype_term.term(weight 3.0)- Evidence items (weight 2.0)
When all validation passes, update CURATION-PROGRESS.md to mark the nutrient as completed:
- Find the nutrient's line in the file
- Change
[~]to[x]to indicate completion - Remove any "in progress" note
Example: Change this line:
- [~] Folate (`kb/nutrients/vitamins/folate.yaml`) - in progress
To:
- [x] Folate (`kb/nutrients/vitamins/folate.yaml`)
File Naming Convention
Use lowercase with hyphens:
- "Vitamin B12" ->
vitamin-b12.yaml - "Alpha-Lipoic Acid" ->
alpha-lipoic-acid.yaml - "Coenzyme Q10" ->
coenzyme-q10.yaml
Minimum Required Fields
A new nutrient file MUST include at minimum:
| Field | Source | Notes |
|---|---|---|
name |
MIC page | Human-readable nutrient name |
nutrient_term |
OAK lookup | CHEBI term binding |
category |
MIC category | vitamin, mineral, etc. |
source_url |
MIC URL | Full URL to MIC page |
functions (1+) |
MIC content | At least one function |
evidence (1+) |
MIC references | At least one PMID reference |
Common Validation Errors
"Term not found in ontology"
Re-run OAK lookup with fuzzy search:
uv run runoak -i sqlite:obo:chebi info "l~<term>"
"Snippet not found in reference"
The quoted text must be from the PMID's abstract. Fetch and verify:
just fetch-reference PMID:12345678
"Required field missing"
Check the schema for required fields. nutrient_term is always required.
Integration with Other Skills
- Use mic-terms for detailed ontology term lookups
- Use mic-references to validate/repair evidence items
- Use mic-compliance to check completeness
Anti-Hallucination Checklist
Before finalizing a nutrient file, verify:
- All PMIDs are from the MIC page references
- All snippets are exact quotes from PubMed abstracts
- CHEBI term exists and label matches exactly
- HP terms exist and labels match exactly
- GO terms exist and labels match exactly
- MONDO terms exist and labels match exactly
- FOODON terms exist and labels match exactly
-
just validatepasses -
just validate-terms-filepasses -
just validate-referencespasses