bio-variant-calling-clinical-interpretation

star 30

Clinical variant interpretation using ClinVar, ACMG guidelines, and pathogenicity predictors. Prioritize variants for diagnostic and research applications. Use when interpreting clinical significance of variants.

mdbabumiamssm By mdbabumiamssm schedule Updated 2/4/2026

name: bio-variant-calling-clinical-interpretation description: Clinical variant interpretation using ClinVar, ACMG guidelines, and pathogenicity predictors. Prioritize variants for diagnostic and research applications. Use when interpreting clinical significance of variants. tool_type: mixed primary_tool: InterVar measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools: - read_file - run_shell_command

Clinical Variant Interpretation

Prioritize and interpret variants for clinical significance using databases and ACMG/AMP guidelines.

Core Capabilities

  • Ground variant narratives in ClinVar, gnomAD, ACMG/AMP criteria, and disease databases; cite evidence for each clinical claim; detect conflicts between database records; and keep pathogenicity evidence separate from clearly labeled model speculation.
  • Apply precision grounding for genetic variant summarization: retrieve database-backed evidence from ClinVar, gnomAD, ACMG/AMP criteria, disease-gene validity resources, and literature provenance; cite each claim; separate known evidence from model inference; label uncertainty explicitly; keep pathogenicity classification separate from explanatory prose; flag conflicting evidence or stale assertions.
  • Require precision-grounded genetic variant summaries to retrieve evidence from ClinVar, gnomAD, ACMG/AMP criteria, and disease databases before synthesis, with provenance-linked statements, explicit uncertainty, and conflict reporting.
  • Generate Precision Grounding-style LLM variant summaries by separating database facts from model synthesis, citing ClinVar variation/accession records, gnomAD population evidence, and ACMG/AMP criteria used for each statement, then running hallucination checks against the retrieved evidence before including content in genetic variant reports.
  • Produce precision-grounded variant summaries with explicit provenance for ClinVar assertions, gnomAD population frequency, COSMIC somatic evidence, and literature findings; surface conflicts across sources for manual review, and do not make or upgrade pathogenicity claims unless directly supported by retrieved evidence and applicable ACMG/AMP criteria.
  • Before drafting genetic variant summaries, retrieve and reconcile evidence from ClinVar, gnomAD, dbSNP, COSMIC, ACMG/AMP criteria, and disease databases; include only citation-backed claims, detect conflicts across sources, and label uncertainty when evidence is incomplete or discordant.
  • For LLM-assisted variant summarization, link each interpretive statement to retrieved ClinVar, gnomAD, dbSNP, COSMIC, or ACMG/AMP evidence; run hallucination checks against those records and remove unsupported claims before reporting.
  • For evidence-grounded genetic variant summaries, cross-check ClinVar assertions, gnomAD population frequencies, and dbSNP identifiers before synthesis; attach citations or database accession links to each assertion, report source conflicts explicitly, and audit the final text against retrieved records to remove hallucinated or unsupported content.
  • Use precision-grounded review checkpoints before clinical reporting: reconcile ClinVar clinical assertions, gnomAD frequency evidence, and OMIM-style disease-gene context; tag every summary claim with source provenance; label unsupported, discordant, or incomplete evidence as uncertain; and require manual review of conflicts before final report language is released.
  • Create auditable precision-grounded variant-summary outputs by retrieving database-backed ClinVar, gnomAD, and literature evidence; mapping each ACMG/AMP evidence code to supporting provenance; documenting contradictions between sources; running hallucination checks against retrieved records; and preserving source-linked evidence tables with the final summary.

Interpretation Framework

Annotated VCF
    │
    ├── Database Lookup
    │   ├── ClinVar (clinical assertions)
    │   ├── OMIM (disease associations)
    │   └── gnomAD (population frequency)
    │
    ├── Computational Predictions
    │   ├── SIFT, PolyPhen-2
    │   ├── CADD, REVEL
    │   └── SpliceAI
    │
    ├── ACMG Classification
    │   └── Pathogenic → Likely Pathogenic → VUS → Likely Benign → Benign
    │
    └── Prioritized Variant List

ClinVar Annotation

Download ClinVar

wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi

Annotate with bcftools

bcftools annotate \
    -a clinvar.vcf.gz \
    -c INFO/CLNSIG,INFO/CLNDN,INFO/CLNREVSTAT \
    input.vcf.gz -Oz -o with_clinvar.vcf.gz

Filter Pathogenic Variants

# Pathogenic or Likely pathogenic
bcftools view -i 'INFO/CLNSIG~"Pathogenic" || INFO/CLNSIG~"Likely_pathogenic"' \
    with_clinvar.vcf.gz -Oz -o pathogenic.vcf.gz

# Exclude benign
bcftools view -e 'INFO/CLNSIG~"Benign" || INFO/CLNSIG~"Likely_benign"' \
    with_clinvar.vcf.gz -Oz -o not_benign.vcf.gz

ClinVar Significance Levels

CLNSIG Meaning Action
Pathogenic Disease-causing Report
Likely_pathogenic Probably disease-causing Report with caveat
Uncertain_significance VUS May report, needs follow-up
Likely_benign Probably not disease-causing Usually exclude
Benign Not disease-causing Exclude
Conflicting Multiple interpretations Manual review

ClinVar Review Status

CLNREVSTAT Stars Meaning
practice_guideline 4 Expert panel reviewed
reviewed_by_expert_panel 3 ClinGen expert reviewed
criteria_provided,_multiple_submitters 2 Consistent assertions
criteria_provided,_single_submitter 1 One submitter with criteria
no_assertion_criteria 0 No criteria provided
# Filter for high-confidence assertions (2+ stars)
bcftools view -i 'INFO/CLNREVSTAT~"multiple_submitters" || \
    INFO/CLNREVSTAT~"expert_panel" || \
    INFO/CLNREVSTAT~"practice_guideline"' \
    with_clinvar.vcf.gz -Oz -o high_confidence.vcf.gz

InterVar (ACMG Classification)

Automated ACMG/AMP variant classification.

Installation

git clone https://github.com/WGLab/InterVar.git
cd InterVar
# Download databases per documentation

Run InterVar

python Intervar.py \
    -i input.avinput \
    -o output \
    -b hg38 \
    -d humandb/ \
    --input_type=AVinput

From VCF

# Convert VCF to ANNOVAR format
convert2annovar.pl -format vcf4 input.vcf > input.avinput

# Run InterVar
python Intervar.py -i input.avinput -o intervar_results -b hg38

ACMG/AMP Criteria

Pathogenic Criteria

Code Type Description
PVS1 Very Strong Null variant in gene where LOF is disease mechanism
PS1-4 Strong Same AA change, functional studies, etc.
PM1-6 Moderate Hot spot, absent from controls, etc.
PP1-5 Supporting Co-segregation, computational evidence

Benign Criteria

Code Type Description
BA1 Stand-alone AF >5% in gnomAD
BS1-4 Strong AF greater than expected, functional studies
BP1-7 Supporting Missense in gene with truncating mechanism

Population Frequency Filtering

# Rare variants only (gnomAD AF < 0.01)
bcftools view -i 'INFO/gnomAD_AF<0.01 || INFO/gnomAD_AF="."' \
    input.vcf.gz -Oz -o rare.vcf.gz

# Ultra-rare for dominant diseases (AF < 0.0001)
bcftools view -i 'INFO/gnomAD_AF<0.0001 || INFO/gnomAD_AF="."' \
    input.vcf.gz -Oz -o ultrarare.vcf.gz

Pathogenicity Score Filtering

CADD Scores

# CADD > 20 (top 1% deleterious)
bcftools view -i 'INFO/CADD_PHRED>20' input.vcf.gz -Oz -o cadd_filtered.vcf.gz

# CADD > 30 (top 0.1%)
bcftools view -i 'INFO/CADD_PHRED>30' input.vcf.gz -Oz -o highly_deleterious.vcf.gz

REVEL Scores

# REVEL > 0.5 (likely pathogenic)
bcftools view -i 'INFO/REVEL>0.5' input.vcf.gz -Oz -o revel_filtered.vcf.gz

Combined Filtering

bcftools view -i '(INFO/CADD_PHRED>20 || INFO/REVEL>0.5) && \
    (INFO/CLNSIG~"Pathogenic" || INFO/CLNSIG~"Likely" || INFO/CLNSIG=".")' \
    input.vcf.gz -Oz -o prioritized.vcf.gz

Python: Clinical Prioritization

from cyvcf2 import VCF, Writer

def classify_variant(variant):
    clnsig = variant.INFO.get('CLNSIG', '')
    af = variant.INFO.get('gnomAD_AF', 0) or 0
    cadd = variant.INFO.get('CADD_PHRED', 0) or 0
    revel = variant.INFO.get('REVEL', 0) or 0

    # Known pathogenic
    if 'Pathogenic' in str(clnsig):
        return 'PATHOGENIC'
    if 'Likely_pathogenic' in str(clnsig):
        return 'LIKELY_PATHOGENIC'

    # Known benign
    if 'Benign' in str(clnsig) or af > 0.05:
        return 'BENIGN'

    # Computational prediction
    if cadd > 25 or revel > 0.7:
        if af < 0.0001:
            return 'LIKELY_PATHOGENIC'
        elif af < 0.01:
            return 'VUS_FAVOR_PATH'

    if cadd < 10 and revel < 0.3:
        return 'LIKELY_BENIGN'

    return 'VUS'

vcf = VCF('annotated.vcf.gz')
results = []

for variant in vcf:
    classification = classify_variant(variant)
    if classification in ('PATHOGENIC', 'LIKELY_PATHOGENIC', 'VUS_FAVOR_PATH'):
        gene = variant.INFO.get('SYMBOL', 'Unknown')
        consequence = variant.INFO.get('Consequence', 'Unknown')
        results.append({
            'chrom': variant.CHROM,
            'pos': variant.POS,
            'ref': variant.REF,
            'alt': variant.ALT[0],
            'gene': gene,
            'consequence': consequence,
            'classification': classification,
            'clnsig': variant.INFO.get('CLNSIG', '.'),
            'cadd': variant.INFO.get('CADD_PHRED', '.'),
            'af': variant.INFO.get('gnomAD_AF', '.')
        })

# Output prioritized variants
for r in results:
    print(f"{r['gene']}\t{r['chrom']}:{r['pos']}\t{r['consequence']}\t{r['classification']}")

Gene Panel Filtering

# Filter to gene panel
bcftools view -R gene_panel.bed input.vcf.gz -Oz -o panel_variants.vcf.gz

# Or by gene symbol (requires VEP annotation)
bcftools view -i 'INFO/CSQ~"BRCA1" || INFO/CSQ~"BRCA2"' \
    input.vcf.gz -Oz -o brca_variants.vcf.gz

Disease-Specific Resources

Resource Content Use
ClinVar Clinical assertions Primary lookup
OMIM Gene-disease relationships Gene prioritization
HGMD Published mutations Literature evidence
gnomAD Population frequencies Rarity filtering
ClinGen Gene validity/dosage LOF interpretation

Reporting Template

bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%INFO/SYMBOL\t%INFO/Consequence\t\
%INFO/CLNSIG\t%INFO/CLNDN\t%INFO/gnomAD_AF\t%INFO/CADD_PHRED\n' \
    prioritized.vcf.gz > clinical_report.tsv

Complete Workflow

#!/bin/bash
set -euo pipefail

INPUT=$1
CLINVAR=$2
OUTPUT_PREFIX=$3

echo "=== Add ClinVar annotations ==="
bcftools annotate -a $CLINVAR \
    -c INFO/CLNSIG,INFO/CLNDN,INFO/CLNREVSTAT,INFO/CLNVC \
    $INPUT -Oz -o ${OUTPUT_PREFIX}_clinvar.vcf.gz

echo "=== Filter rare variants ==="
bcftools view -i 'INFO/gnomAD_AF<0.01 || INFO/gnomAD_AF="."' \
    ${OUTPUT_PREFIX}_clinvar.vcf.gz -Oz -o ${OUTPUT_PREFIX}_rare.vcf.gz

echo "=== Extract pathogenic/likely pathogenic ==="
bcftools view -i 'INFO/CLNSIG~"athogenic"' \
    ${OUTPUT_PREFIX}_rare.vcf.gz -Oz -o ${OUTPUT_PREFIX}_pathogenic.vcf.gz

echo "=== Extract high-impact VUS ==="
bcftools view -i 'INFO/CLNSIG~"Uncertain" && INFO/CADD_PHRED>20' \
    ${OUTPUT_PREFIX}_rare.vcf.gz -Oz -o ${OUTPUT_PREFIX}_vus_review.vcf.gz

echo "=== Generate report ==="
bcftools query -H -f '%CHROM\t%POS\t%REF\t%ALT\t%INFO/SYMBOL\t%INFO/Consequence\t\
%INFO/CLNSIG\t%INFO/CLNDN\t%INFO/gnomAD_AF\t%INFO/CADD_PHRED\n' \
    ${OUTPUT_PREFIX}_pathogenic.vcf.gz > ${OUTPUT_PREFIX}_report.tsv

echo "=== Complete ==="
echo "Pathogenic: ${OUTPUT_PREFIX}_pathogenic.vcf.gz"
echo "VUS for review: ${OUTPUT_PREFIX}_vus_review.vcf.gz"
echo "Report: ${OUTPUT_PREFIX}_report.tsv"

Related Skills

  • variant-calling/variant-annotation - VEP/SnpEff annotation
  • variant-calling/filtering-best-practices - Quality filtering
  • database-access/entrez-fetch - Download ClinVar/OMIM data
  • pathway-analysis/go-enrichment - Gene set analysis

References

Install via CLI
npx skills add https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- --skill bio-variant-calling-clinical-interpretation
Repository Details
star Stars 30
call_split Forks 7
navigation Branch main
article Path SKILL.md
Occupations
More from Creator
mdbabumiamssm
mdbabumiamssm Explore all skills →