name: mic-terms description: > Skill for adding and validating ontology term annotations in the MIC knowledge base. Covers CHEBI, FOODON, HP, GO, MONDO, UBERON, HGNC lookups. Use when adding term bindings to nutrient YAML files.
MIC Ontology Terms Skill
Overview
Add and validate ontology term references in the MIC nutrient knowledge base. This ensures nutrients, phenotypes, biological processes, diseases, foods, and genes are properly linked to authoritative ontology terms with correct IDs and labels.
When to Use
- Adding
nutrient_termto nutrient entries (uses CHEBI) - Adding
phenotype_termto phenotype entries (uses HP) - Adding
termtobiological_processesentries (uses GO) - Adding
disease_termto disease associations (uses MONDO) - Adding
food_termto food sources (uses FOODON) - Adding
termtogenesentries (uses HGNC) - Validating existing ontology term references
- Fixing label mismatches
Ontology Mappings
| Entity Type | Ontology | OAK Adapter | Prefix |
|---|---|---|---|
| Nutrients | CHEBI | sqlite:obo:chebi | CHEBI: |
| Foods | FOODON | sqlite:obo:foodon | FOODON: |
| Phenotypes | HP | sqlite:obo:hp | HP: |
| Biological Processes | GO | sqlite:obo:go | GO: |
| Cellular Components | GO | sqlite:obo:go | GO: |
| Diseases | MONDO | sqlite:obo:mondo | MONDO: |
| Anatomy | UBERON | sqlite:obo:uberon | UBERON: |
| Genes | HGNC | sqlite:obo:hgnc | HGNC: |
Term Object Structure
All term references follow this YAML structure:
# For nutrients (CHEBI):
nutrient_term:
preferred_term: biotin
term:
id: CHEBI:15956
label: biotin
# For phenotypes (HP):
phenotype_term:
preferred_term: Dermatitis
term:
id: HP:0000964
label: Eczema
# For biological processes (GO):
biological_processes:
- preferred_term: fatty acid biosynthesis
term:
id: GO:0006633
label: fatty acid biosynthetic process
# For diseases (MONDO):
disease_term:
preferred_term: neural tube defect
term:
id: MONDO:0005343
label: neural tube defect
# For foods (FOODON):
food_term:
preferred_term: egg yolk
term:
id: FOODON:00002669
label: egg yolk
# For genes (HGNC):
genes:
- preferred_term: PC
description: Pyruvate carboxylase
term:
id: HGNC:8636
label: PC
OAK Lookup Commands
Exact Match
uv run runoak -i sqlite:obo:chebi info "biotin"
# Returns: CHEBI:15956 ! biotin
Fuzzy/Label Search
uv run runoak -i sqlite:obo:hp info "l~cognitive impairment"
# Returns multiple matches - select the most appropriate
Starts-With Search
uv run runoak -i sqlite:obo:hp info "l^skin"
# Returns terms starting with "skin"
Get Full Term Details
uv run runoak -i sqlite:obo:chebi info CHEBI:15956 -O obo
# Returns complete term information including definition
Search by ID
uv run runoak -i sqlite:obo:go info GO:0006633
# Returns: GO:0006633 ! fatty acid biosynthetic process
Hierarchical Relationships
uv run runoak -i sqlite:obo:go relationships --direction both GO:0006633
# Shows parent and child terms
Common Nutrient Terms (CHEBI)
| Nutrient | CHEBI ID | Label |
|---|---|---|
| Biotin | CHEBI:15956 | biotin |
| Folate | CHEBI:27470 | folate |
| Vitamin C | CHEBI:29073 | ascorbic acid |
| Vitamin D | CHEBI:27300 | vitamin D |
| Vitamin E | CHEBI:18145 | vitamin E |
| Vitamin K | CHEBI:28384 | vitamin K |
| Thiamin | CHEBI:18385 | thiamine |
| Riboflavin | CHEBI:17015 | riboflavin |
| Niacin | CHEBI:15940 | nicotinic acid |
| Calcium | CHEBI:22984 | calcium(2+) |
| Iron | CHEBI:24875 | iron(2+) |
| Zinc | CHEBI:27363 | zinc(2+) |
| Magnesium | CHEBI:18420 | magnesium(2+) |
| Selenium | CHEBI:27568 | selenium atom |
Specificity Guidelines
Critical: Always use the most specific term that accurately describes the entity.
| Incorrect (too general) | Correct (specific) |
|---|---|
| HP:0000001 All | HP:0000964 Eczema |
| GO:0008150 biological_process | GO:0006633 fatty acid biosynthetic process |
| CHEBI:24431 chemical entity | CHEBI:15956 biotin |
When a fuzzy search returns multiple results:
- Review all candidates
- Check term definitions with
-O oboflag - Select the term that most precisely matches the biological context
- If no specific term exists, use the closest parent but note the limitation
Validation
Validate Term Structure
just validate kb/nutrients/vitamins/biotin.yaml
Validate Term IDs and Labels
just validate-terms-file kb/nutrients/vitamins/biotin.yaml
This checks:
- Term IDs exist in the ontology
- Labels match the canonical ontology labels exactly
- Required fields are present
Fixing Label Mismatches
If validation reports a label mismatch:
LABEL MISMATCH: biotin.yaml
Term: HP:0000964
Expected: Eczema
Actual: Dermatitis
Update the label field to match the ontology's canonical label exactly.
Batch Term Population
To find entries missing term annotations:
import yaml
import glob
for f in glob.glob("kb/nutrients/**/*.yaml", recursive=True):
with open(f) as file:
data = yaml.safe_load(file)
# Check nutrient_term
nt = data.get('nutrient_term', {})
if not nt.get('term'):
print(f"{f}: missing nutrient_term.term")
# Check phenotypes
deficiency = data.get('deficiency', {})
for pheno in deficiency.get('phenotypes', []):
pt = pheno.get('phenotype_term', {})
if not pt.get('term'):
print(f"{f}: {pheno.get('name')} - missing phenotype_term.term")
Common Patterns
Adding CHEBI to a Nutrient
- Look up term:
uv run runoak -i sqlite:obo:chebi info "l~<nutrient name>" - Verify:
uv run runoak -i sqlite:obo:chebi info <CHEBI:ID> -O obo - Add to YAML:
nutrient_term: preferred_term: <Original Name> term: id: <CHEBI:ID> label: <Exact label from OAK> - Validate:
just validate-terms-file kb/nutrients/...
Adding HP to a Phenotype
- Look up term:
uv run runoak -i sqlite:obo:hp info "l~<phenotype>" - Verify specificity
- Add
phenotype_term:block - Validate
Adding GO to Biological Processes
- Look up:
uv run runoak -i sqlite:obo:go info "l~<process>" - Check it's under GO:0008150 (biological_process)
- Add
term:block to the process - Validate
Troubleshooting
"No results found"
- Try broader search terms
- Check spelling
- Try synonyms
- Use fuzzy search:
info "l~<term>"
"Multiple results"
- Check definitions with
-O obo - Choose the most specific term
- Verify the term is from the correct branch (e.g., biological_process vs molecular_function)
"Term exists but label doesn't match"
The ontology canonical label must be used. Common issues:
- Case sensitivity: "Biotin" vs "biotin"
- Synonyms: "Vitamin B7" vs "biotin" (CHEBI uses "biotin")
- Outdated labels: Always fetch the current label from OAK