name: bio-variant-calling-clinical-interpretation
description: Clinical variant interpretation using ClinVar, ACMG guidelines, and pathogenicity predictors. Prioritize variants for diagnostic and research applications. Use when interpreting clinical significance of variants.
tool_type: mixed
primary_tool: InterVar
measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes.
allowed-tools:
- read_file
- run_shell_command
Clinical Variant Interpretation
Prioritize and interpret variants for clinical significance using databases and ACMG/AMP guidelines.
Core Capabilities
- Ground variant narratives in ClinVar, gnomAD, ACMG/AMP criteria, and disease databases; cite evidence for each clinical claim; detect conflicts between database records; and keep pathogenicity evidence separate from clearly labeled model speculation.
- Apply precision grounding for genetic variant summarization: retrieve database-backed evidence from ClinVar, gnomAD, ACMG/AMP criteria, disease-gene validity resources, and literature provenance; cite each claim; separate known evidence from model inference; label uncertainty explicitly; keep pathogenicity classification separate from explanatory prose; flag conflicting evidence or stale assertions.
- Require precision-grounded genetic variant summaries to retrieve evidence from ClinVar, gnomAD, ACMG/AMP criteria, and disease databases before synthesis, with provenance-linked statements, explicit uncertainty, and conflict reporting.
- Generate Precision Grounding-style LLM variant summaries by separating database facts from model synthesis, citing ClinVar variation/accession records, gnomAD population evidence, and ACMG/AMP criteria used for each statement, then running hallucination checks against the retrieved evidence before including content in genetic variant reports.
- Produce precision-grounded variant summaries with explicit provenance for ClinVar assertions, gnomAD population frequency, COSMIC somatic evidence, and literature findings; surface conflicts across sources for manual review, and do not make or upgrade pathogenicity claims unless directly supported by retrieved evidence and applicable ACMG/AMP criteria.
- Before drafting genetic variant summaries, retrieve and reconcile evidence from ClinVar, gnomAD, dbSNP, COSMIC, ACMG/AMP criteria, and disease databases; include only citation-backed claims, detect conflicts across sources, and label uncertainty when evidence is incomplete or discordant.
- For LLM-assisted variant summarization, link each interpretive statement to retrieved ClinVar, gnomAD, dbSNP, COSMIC, or ACMG/AMP evidence; run hallucination checks against those records and remove unsupported claims before reporting.
- For evidence-grounded genetic variant summaries, cross-check ClinVar assertions, gnomAD population frequencies, and dbSNP identifiers before synthesis; attach citations or database accession links to each assertion, report source conflicts explicitly, and audit the final text against retrieved records to remove hallucinated or unsupported content.
- Use precision-grounded review checkpoints before clinical reporting: reconcile ClinVar clinical assertions, gnomAD frequency evidence, and OMIM-style disease-gene context; tag every summary claim with source provenance; label unsupported, discordant, or incomplete evidence as uncertain; and require manual review of conflicts before final report language is released.
- Create auditable precision-grounded variant-summary outputs by retrieving database-backed ClinVar, gnomAD, and literature evidence; mapping each ACMG/AMP evidence code to supporting provenance; documenting contradictions between sources; running hallucination checks against retrieved records; and preserving source-linked evidence tables with the final summary.
Interpretation Framework
Annotated VCF
│
├── Database Lookup
│ ├── ClinVar (clinical assertions)
│ ├── OMIM (disease associations)
│ └── gnomAD (population frequency)
│
├── Computational Predictions
│ ├── SIFT, PolyPhen-2
│ ├── CADD, REVEL
│ └── SpliceAI
│
├── ACMG Classification
│ └── Pathogenic → Likely Pathogenic → VUS → Likely Benign → Benign
│
└── Prioritized Variant List
ClinVar Annotation
Download ClinVar
wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi
Annotate with bcftools
bcftools annotate \
-a clinvar.vcf.gz \
-c INFO/CLNSIG,INFO/CLNDN,INFO/CLNREVSTAT \
input.vcf.gz -Oz -o with_clinvar.vcf.gz
Filter Pathogenic Variants
# Pathogenic or Likely pathogenic
bcftools view -i 'INFO/CLNSIG~"Pathogenic" || INFO/CLNSIG~"Likely_pathogenic"' \
with_clinvar.vcf.gz -Oz -o pathogenic.vcf.gz
# Exclude benign
bcftools view -e 'INFO/CLNSIG~"Benign" || INFO/CLNSIG~"Likely_benign"' \
with_clinvar.vcf.gz -Oz -o not_benign.vcf.gz
ClinVar Significance Levels
| CLNSIG |
Meaning |
Action |
| Pathogenic |
Disease-causing |
Report |
| Likely_pathogenic |
Probably disease-causing |
Report with caveat |
| Uncertain_significance |
VUS |
May report, needs follow-up |
| Likely_benign |
Probably not disease-causing |
Usually exclude |
| Benign |
Not disease-causing |
Exclude |
| Conflicting |
Multiple interpretations |
Manual review |
ClinVar Review Status
| CLNREVSTAT |
Stars |
Meaning |
| practice_guideline |
4 |
Expert panel reviewed |
| reviewed_by_expert_panel |
3 |
ClinGen expert reviewed |
| criteria_provided,_multiple_submitters |
2 |
Consistent assertions |
| criteria_provided,_single_submitter |
1 |
One submitter with criteria |
| no_assertion_criteria |
0 |
No criteria provided |
# Filter for high-confidence assertions (2+ stars)
bcftools view -i 'INFO/CLNREVSTAT~"multiple_submitters" || \
INFO/CLNREVSTAT~"expert_panel" || \
INFO/CLNREVSTAT~"practice_guideline"' \
with_clinvar.vcf.gz -Oz -o high_confidence.vcf.gz
InterVar (ACMG Classification)
Automated ACMG/AMP variant classification.
Installation
git clone https://github.com/WGLab/InterVar.git
cd InterVar
# Download databases per documentation
Run InterVar
python Intervar.py \
-i input.avinput \
-o output \
-b hg38 \
-d humandb/ \
--input_type=AVinput
From VCF
# Convert VCF to ANNOVAR format
convert2annovar.pl -format vcf4 input.vcf > input.avinput
# Run InterVar
python Intervar.py -i input.avinput -o intervar_results -b hg38
ACMG/AMP Criteria
Pathogenic Criteria
| Code |
Type |
Description |
| PVS1 |
Very Strong |
Null variant in gene where LOF is disease mechanism |
| PS1-4 |
Strong |
Same AA change, functional studies, etc. |
| PM1-6 |
Moderate |
Hot spot, absent from controls, etc. |
| PP1-5 |
Supporting |
Co-segregation, computational evidence |
Benign Criteria
| Code |
Type |
Description |
| BA1 |
Stand-alone |
AF >5% in gnomAD |
| BS1-4 |
Strong |
AF greater than expected, functional studies |
| BP1-7 |
Supporting |
Missense in gene with truncating mechanism |
Population Frequency Filtering
# Rare variants only (gnomAD AF < 0.01)
bcftools view -i 'INFO/gnomAD_AF<0.01 || INFO/gnomAD_AF="."' \
input.vcf.gz -Oz -o rare.vcf.gz
# Ultra-rare for dominant diseases (AF < 0.0001)
bcftools view -i 'INFO/gnomAD_AF<0.0001 || INFO/gnomAD_AF="."' \
input.vcf.gz -Oz -o ultrarare.vcf.gz
Pathogenicity Score Filtering
CADD Scores
# CADD > 20 (top 1% deleterious)
bcftools view -i 'INFO/CADD_PHRED>20' input.vcf.gz -Oz -o cadd_filtered.vcf.gz
# CADD > 30 (top 0.1%)
bcftools view -i 'INFO/CADD_PHRED>30' input.vcf.gz -Oz -o highly_deleterious.vcf.gz
REVEL Scores
# REVEL > 0.5 (likely pathogenic)
bcftools view -i 'INFO/REVEL>0.5' input.vcf.gz -Oz -o revel_filtered.vcf.gz
Combined Filtering
bcftools view -i '(INFO/CADD_PHRED>20 || INFO/REVEL>0.5) && \
(INFO/CLNSIG~"Pathogenic" || INFO/CLNSIG~"Likely" || INFO/CLNSIG=".")' \
input.vcf.gz -Oz -o prioritized.vcf.gz
Python: Clinical Prioritization
from cyvcf2 import VCF, Writer
def classify_variant(variant):
clnsig = variant.INFO.get('CLNSIG', '')
af = variant.INFO.get('gnomAD_AF', 0) or 0
cadd = variant.INFO.get('CADD_PHRED', 0) or 0
revel = variant.INFO.get('REVEL', 0) or 0
# Known pathogenic
if 'Pathogenic' in str(clnsig):
return 'PATHOGENIC'
if 'Likely_pathogenic' in str(clnsig):
return 'LIKELY_PATHOGENIC'
# Known benign
if 'Benign' in str(clnsig) or af > 0.05:
return 'BENIGN'
# Computational prediction
if cadd > 25 or revel > 0.7:
if af < 0.0001:
return 'LIKELY_PATHOGENIC'
elif af < 0.01:
return 'VUS_FAVOR_PATH'
if cadd < 10 and revel < 0.3:
return 'LIKELY_BENIGN'
return 'VUS'
vcf = VCF('annotated.vcf.gz')
results = []
for variant in vcf:
classification = classify_variant(variant)
if classification in ('PATHOGENIC', 'LIKELY_PATHOGENIC', 'VUS_FAVOR_PATH'):
gene = variant.INFO.get('SYMBOL', 'Unknown')
consequence = variant.INFO.get('Consequence', 'Unknown')
results.append({
'chrom': variant.CHROM,
'pos': variant.POS,
'ref': variant.REF,
'alt': variant.ALT[0],
'gene': gene,
'consequence': consequence,
'classification': classification,
'clnsig': variant.INFO.get('CLNSIG', '.'),
'cadd': variant.INFO.get('CADD_PHRED', '.'),
'af': variant.INFO.get('gnomAD_AF', '.')
})
# Output prioritized variants
for r in results:
print(f"{r['gene']}\t{r['chrom']}:{r['pos']}\t{r['consequence']}\t{r['classification']}")
Gene Panel Filtering
# Filter to gene panel
bcftools view -R gene_panel.bed input.vcf.gz -Oz -o panel_variants.vcf.gz
# Or by gene symbol (requires VEP annotation)
bcftools view -i 'INFO/CSQ~"BRCA1" || INFO/CSQ~"BRCA2"' \
input.vcf.gz -Oz -o brca_variants.vcf.gz
Disease-Specific Resources
| Resource |
Content |
Use |
| ClinVar |
Clinical assertions |
Primary lookup |
| OMIM |
Gene-disease relationships |
Gene prioritization |
| HGMD |
Published mutations |
Literature evidence |
| gnomAD |
Population frequencies |
Rarity filtering |
| ClinGen |
Gene validity/dosage |
LOF interpretation |
Reporting Template
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%INFO/SYMBOL\t%INFO/Consequence\t\
%INFO/CLNSIG\t%INFO/CLNDN\t%INFO/gnomAD_AF\t%INFO/CADD_PHRED\n' \
prioritized.vcf.gz > clinical_report.tsv
Complete Workflow
#!/bin/bash
set -euo pipefail
INPUT=$1
CLINVAR=$2
OUTPUT_PREFIX=$3
echo "=== Add ClinVar annotations ==="
bcftools annotate -a $CLINVAR \
-c INFO/CLNSIG,INFO/CLNDN,INFO/CLNREVSTAT,INFO/CLNVC \
$INPUT -Oz -o ${OUTPUT_PREFIX}_clinvar.vcf.gz
echo "=== Filter rare variants ==="
bcftools view -i 'INFO/gnomAD_AF<0.01 || INFO/gnomAD_AF="."' \
${OUTPUT_PREFIX}_clinvar.vcf.gz -Oz -o ${OUTPUT_PREFIX}_rare.vcf.gz
echo "=== Extract pathogenic/likely pathogenic ==="
bcftools view -i 'INFO/CLNSIG~"athogenic"' \
${OUTPUT_PREFIX}_rare.vcf.gz -Oz -o ${OUTPUT_PREFIX}_pathogenic.vcf.gz
echo "=== Extract high-impact VUS ==="
bcftools view -i 'INFO/CLNSIG~"Uncertain" && INFO/CADD_PHRED>20' \
${OUTPUT_PREFIX}_rare.vcf.gz -Oz -o ${OUTPUT_PREFIX}_vus_review.vcf.gz
echo "=== Generate report ==="
bcftools query -H -f '%CHROM\t%POS\t%REF\t%ALT\t%INFO/SYMBOL\t%INFO/Consequence\t\
%INFO/CLNSIG\t%INFO/CLNDN\t%INFO/gnomAD_AF\t%INFO/CADD_PHRED\n' \
${OUTPUT_PREFIX}_pathogenic.vcf.gz > ${OUTPUT_PREFIX}_report.tsv
echo "=== Complete ==="
echo "Pathogenic: ${OUTPUT_PREFIX}_pathogenic.vcf.gz"
echo "VUS for review: ${OUTPUT_PREFIX}_vus_review.vcf.gz"
echo "Report: ${OUTPUT_PREFIX}_report.tsv"
Related Skills
- variant-calling/variant-annotation - VEP/SnpEff annotation
- variant-calling/filtering-best-practices - Quality filtering
- database-access/entrez-fetch - Download ClinVar/OMIM data
- pathway-analysis/go-enrichment - Gene set analysis
References