gnomad-database

star 54

gnomAD v4 population variant frequencies via GraphQL API. Allele counts and frequencies stratified by ancestry (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID), gene-level constraint (pLI, LOEUF, missense z), and coverage. Identify rare or constrained variants. For clinical pathogenicity use clinvar-database; for GWAS use gwas-database.

BioTender-max By BioTender-max schedule Updated 5/25/2026

name: "gnomad-database" description: "gnomAD v4 population variant frequencies via GraphQL API. Allele counts and frequencies stratified by ancestry (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID), gene-level constraint (pLI, LOEUF, missense z), and coverage. Identify rare or constrained variants. For clinical pathogenicity use clinvar-database; for GWAS use gwas-database." license: "ODbL-1.0"

gnomAD Database

Overview

The Genome Aggregation Database (gnomAD) is a resource of aggregated exome and genome sequencing data from 730,000+ individuals. It provides population variant frequencies stratified by 9 ancestry groups, gene-level constraint scores (pLI, LOEUF), and read coverage information. Access is free via a GraphQL API at https://gnomad.broadinstitute.org/api — no authentication required, no official SDK.

When to Use

  • Checking whether a candidate variant is rare enough to be clinically relevant (AF < 0.1% in all populations)
  • Retrieving allele frequencies stratified by ancestry group (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID) for a variant
  • Identifying all rare loss-of-function variants in a gene for burden testing or candidate prioritization
  • Getting gene constraint metrics (pLI, LOEUF) to assess tolerance to loss-of-function variants
  • Checking read depth coverage for a region to evaluate if low variant frequency reflects low sequencing coverage
  • Filtering a VCF by population frequency — query gnomAD AF to discard common variants before clinical interpretation
  • For clinical pathogenicity classifications use clinvar-database; gnomAD provides frequency evidence but does not classify pathogenicity
  • For GWAS associations at the study level use gwas-database; gnomAD is for population frequency lookups

Prerequisites

  • Python packages: requests, pandas, matplotlib
  • Data requirements: gene symbols (e.g., BRCA1), variant IDs (1-69511-A-G format, or rsIDs)
  • Environment: internet connection; no API key required
  • Rate limits: no official published limits; use time.sleep(0.5) between requests for polite access; avoid bursts over 10 requests/second
pip install requests pandas matplotlib

Quick Start

import requests
import time

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query: str, variables: dict = None) -> dict:
    """Execute a gnomAD GraphQL query and return the data payload."""
    payload = {"query": query, "variables": variables or {}}
    r = requests.post(GNOMAD_API, json=payload, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(f"GraphQL errors: {result['errors']}")
    return result["data"]

# Quick check: get pLI / LOEUF for BRCA1
# GnomadConstraint fields are FLAT (no nested `lof { oe_ci { upper } }` type).
# `pli` is the current field; `pLI` is preserved as a deprecated alias.
query = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gnomad_constraint { pli oe_lof_upper }
  }
}
"""
data = gnomad_query(query, {"gene_symbol": "BRCA1", "reference_genome": "GRCh38"})
constraint = data["gene"]["gnomad_constraint"]
print(f"BRCA1 pLI:   {constraint['pli']:.3e}")        # ~5.5e-38 (very high LoF-intolerant)
print(f"BRCA1 LOEUF: {constraint['oe_lof_upper']:.3f}") # 0.928

Core API

Query 1: Gene Variant Query

Fetch all variants in a gene with population allele frequencies. Returns a list of variants with their genome-level frequencies.

import requests, time

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(f"GraphQL errors: {result['errors']}")
    return result["data"]

GENE_VARIANTS_QUERY = """
query GeneVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id
    symbol
    variants(dataset: $dataset) {
      variant_id
      rsids
      chrom
      pos
      ref
      alt
      consequence
      lof
      genome {
        an
        ac
        af
        faf95 { popmax popmax_population }
      }
    }
  }
}
"""

data = gnomad_query(GENE_VARIANTS_QUERY, {
    "gene_symbol": "PCSK9",
    "reference_genome": "GRCh38",
    "dataset": "gnomad_r4"
})
variants = data["gene"]["variants"]
print(f"Gene: {data['gene']['symbol']} ({data['gene']['gene_id']})")
print(f"Total variants: {len(variants)}")
# Filter to rare variants (AF < 0.001)
rare = [v for v in variants if v["genome"] and v["genome"]["af"] is not None and v["genome"]["af"] < 0.001]
print(f"Rare variants (AF < 0.1%): {len(rare)}")
for v in rare[:3]:
    print(f"  {v['variant_id']} | {v['consequence']} | AF={v['genome']['af']:.2e}")

Query 2: Variant Lookup

Fetch detailed information for a single variant by its gnomAD variant ID (CHROM-POS-REF-ALT format) or search by rsID.

VARIANT_QUERY = """
query VariantDetails($variantId: String!, $dataset: DatasetId!) {
  variant(variantId: $variantId, dataset: $dataset) {
    variant_id
    rsids
    chrom
    pos
    ref
    alt
    transcript_consequences {
      gene_symbol
      transcript_id
      is_canonical
      major_consequence
      lof
      lof_filter
      lof_flags
    }
    genome {
      an
      ac
      af
      faf95 { popmax popmax_population }
      populations { id ac an homozygote_count }
    }
  }
}
"""

# Query.variant() arg is `variantId` (camelCase). The top-level deprecated
# `consequence`/`lof`/`lof_filter`/`lof_flags` fields on VariantDetails were
# removed — read them from `transcript_consequences` (plural list; pick the
# canonical transcript with is_canonical=True).
data = gnomad_query(VARIANT_QUERY, {
    "variantId": "1-55039974-G-T",    # PCSK9 p.Tyr142Ter (LoF)
    "dataset": "gnomad_r4"
})
v = data["variant"]
canon = next((t for t in (v.get("transcript_consequences") or []) if t.get("is_canonical")),
             (v.get("transcript_consequences") or [{}])[0])
print(f"Variant     : {v['variant_id']}")
print(f"rsIDs       : {v['rsids']}")
print(f"Gene        : {canon.get('gene_symbol')}")
print(f"Consequence : {canon.get('major_consequence')}  |  LoF: {canon.get('lof')}")
g = v["genome"]
print(f"Genome AF   : {g['af']:.2e}  (AC={g['ac']}, AN={g['an']})")
print(f"FAF95 popmax: {g['faf95']['popmax']:.2e} in {g['faf95']['popmax_population']}")

Query 3: Population Frequencies

Retrieve allele frequency broken down by ancestry group for a specific variant.

import pandas as pd

POPULATION_FREQ_QUERY = """
query PopFreqs($variantId: String!, $dataset: DatasetId!) {
  variant(variantId: $variantId, dataset: $dataset) {
    variant_id
    genome {
      populations {
        id
        ac
        an
        homozygote_count
      }
    }
  }
}
"""

ANCESTRY_LABELS = {
    "afr": "African/African American",
    "amr": "Admixed American",
    "eas": "East Asian",
    "fin": "Finnish",
    "nfe": "Non-Finnish European",
    "sas": "South Asian",
    "asj": "Ashkenazi Jewish",
    "mid": "Middle Eastern",
    "oth": "Other",
}

data = gnomad_query(POPULATION_FREQ_QUERY, {
    "variantId": "1-55039974-G-T",
    "dataset": "gnomad_r4"
})
pops = data["variant"]["genome"]["populations"]

# VariantPopulation no longer exposes `af` directly — compute from ac/an.
main_pops = [p for p in pops if p["id"] in ANCESTRY_LABELS and p["an"] > 0]
df = pd.DataFrame(main_pops)
df["af"] = df["ac"] / df["an"]
df["label"] = df["id"].map(ANCESTRY_LABELS)
df = df.sort_values("af", ascending=False)
print(df[["label", "ac", "an", "af", "homozygote_count"]].to_string(index=False))

Query 4: Coverage Query

Retrieve per-base read depth coverage for a gene region to assess data completeness.

COVERAGE_QUERY = """
query Coverage($chrom: String!, $start: Int!, $stop: Int!,
               $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
  region(chrom: $chrom, start: $start, stop: $stop, reference_genome: $reference_genome) {
    coverage(dataset: $dataset) {
      exome  { mean median over_1 over_10 over_20 over_30 over_100 }
      genome { mean median over_1 over_10 over_20 over_30 over_100 }
    }
  }
}
"""

# Coverage is no longer a top-level Query field — it lives under Region, and the
# `RegionCoverage` shape returns parallel `exome` / `genome` arrays (one entry per
# position; there is no per-row `pos` — the array index is implicit position).
data = gnomad_query(COVERAGE_QUERY, {
    "chrom": "1",
    "start": 55039700,
    "stop": 55040200,
    "reference_genome": "GRCh38",
    "dataset": "gnomad_r4",
})
cov = data["region"]["coverage"]
exome  = cov.get("exome")  or []
genome = cov.get("genome") or []
print(f"Coverage positions: exome={len(exome)}, genome={len(genome)}")
if exome:
    avg_mean = sum(c["mean"] for c in exome) / len(exome)
    pct_20x = sum(1 for c in exome if (c.get("over_20") or 0) > 0.9) / len(exome) * 100
    print(f"Exome  — average mean depth: {avg_mean:.1f}x; {pct_20x:.1f}% positions >=90% at >=20x")
    c = exome[0]
    print(f"  Sample position: mean={c['mean']:.1f}x, median={c['median']}x, "
          f">=10x:{c['over_10']:.3f} >=20x:{c['over_20']:.3f} >=30x:{c['over_30']:.3f}")

Query 5: Gene Constraint

Retrieve gene-level constraint scores: pLI (probability of loss-of-function intolerance), LOEUF (LoF observed/expected upper bound fraction), and missense z-score.

CONSTRAINT_QUERY = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id
    symbol
    name
    gnomad_constraint {
      pli
      mis_z
      lof_z
      obs_lof
      exp_lof
      oe_lof
      oe_lof_lower
      oe_lof_upper
    }
  }
}
"""

# GnomadConstraint fields are flat (no nested `lof { obs exp oe oe_ci }`).
# `gene_name` was removed from Gene — use `name` or `symbol`. `pNull`/`pRec` are gone.
genes = ["PCSK9", "BRCA1", "TP53", "TTN"]
print(f"{'Gene':<10} {'pLI':>10} {'LOEUF':>7} {'mis_z':>7}")
print("-" * 38)
for gene in genes:
    data = gnomad_query(CONSTRAINT_QUERY, {"gene_symbol": gene, "reference_genome": "GRCh38"})
    c = data["gene"]["gnomad_constraint"]
    print(f"{gene:<10} {c['pli']:>10.3e} {c['oe_lof_upper']:>7.3f} {c['mis_z']:>7.2f}")
    time.sleep(0.5)
# Gene             pLI   LOEUF   mis_z
# PCSK9      4.27e-04   0.456    1.41
# BRCA1      5.52e-38   0.928    1.73
# TP53       8.86e-22   1.020    1.93
# TTN        1.00e+00   0.871    1.30

Query 6: Variant Search by Region

Fetch all variants in a chromosomal region, useful for targeted panels and regional analyses.

REGION_VARIANTS_QUERY = """
query RegionVariants($chrom: String!, $start: Int!, $stop: Int!,
                     $dataset: DatasetId!, $reference_genome: ReferenceGenomeId!) {
  region(chrom: $chrom, start: $start, stop: $stop,
         reference_genome: $reference_genome) {
    variants(dataset: $dataset) {
      variant_id
      rsids
      pos
      consequence
      lof
      genome {
        af
        ac
        an
        faf95 { popmax }
      }
    }
  }
}
"""

data = gnomad_query(REGION_VARIANTS_QUERY, {
    "chrom": "1",
    "start": 55039974,
    "stop": 55064852,   # PCSK9 coding region
    "dataset": "gnomad_r4",
    "reference_genome": "GRCh38"
})
variants = data["region"]["variants"]
print(f"Variants in region: {len(variants)}")

# Summarize by consequence
from collections import Counter
conseq_counts = Counter(v["consequence"] for v in variants if v["consequence"])
for c, n in conseq_counts.most_common(5):
    print(f"  {c}: {n}")

# Loss-of-function variants
lof_vars = [v for v in variants if v["lof"] == "HC"]
print(f"\nHigh-confidence LoF variants: {len(lof_vars)}")
for v in lof_vars[:3]:
    af = v["genome"]["af"] if v["genome"] else None
    print(f"  {v['variant_id']} | AF={af:.2e}" if af else f"  {v['variant_id']} | AF=NA")

Key Concepts

gnomAD Data Model

gnomAD v4 has two datasets: gnomad_r4 (exomes + genomes, GRCh38, 730K+ individuals) and gnomad_r2_1 (GRCh37, 141K individuals). The API uses a GraphQL schema where variants are accessed either through gene(), region(), or direct variant() lookups. Each variant has separate exome and genome frequency objects; the genome object is preferred for population frequency comparisons.

Ancestry Groups

gnomAD v4 reports frequencies for 9 top-level ancestry groups identified by genetic ancestry (not self-reported):

Code Population Dataset size (approx)
afr African/African American 76,000+
amr Admixed American 45,000+
eas East Asian 50,000+
fin Finnish 24,000+
nfe Non-Finnish European 400,000+
sas South Asian 80,000+
asj Ashkenazi Jewish 10,000+
mid Middle Eastern 5,000+
oth Other/Unknown varies

Filtering Allele Frequency (FAF95)

The faf95 field provides a one-sided 95% confidence interval lower bound on the allele frequency in the population where the variant is most common. Use this for conservative variant filtering in clinical pipelines — a variant with faf95.popmax < 0.001 is likely rare enough to warrant clinical investigation.

Constraint Scores

Score Interpretation
pLI > 0.9 Gene is intolerant to LoF — likely essential
LOEUF < 0.35 Strong LoF constraint (upper CI of oe ratio)
mis_z > 3.09 Gene shows significant missense constraint
pLI < 0.1 Gene tolerates LoF — homozygous LoF variants exist

Common Workflows

Workflow 1: Rare Variant Frequency Report for a Gene

Goal: Retrieve all rare (AF < 1%) variants in a gene, stratified by consequence, exported to CSV.

import requests, time, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(result["errors"])
    return result["data"]

GENE_VARIANTS_QUERY = """
query GeneVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id symbol name
    variants(dataset: $dataset) {
      variant_id rsids chrom pos ref alt consequence lof lof_filter
      genome {
        an ac af
        faf95 { popmax popmax_population }
        populations { id ac an homozygote_count }
      }
    }
  }
}
"""

gene = "LDLR"
data = gnomad_query(GENE_VARIANTS_QUERY, {
    "gene_symbol": gene,
    "reference_genome": "GRCh38",
    "dataset": "gnomad_r4"
})

variants = data["gene"]["variants"]
rows = []
for v in variants:
    g = v.get("genome") or {}
    af = g.get("af")
    if af is None or af >= 0.01:   # keep only rare variants
        continue
    rows.append({
        "variant_id": v["variant_id"],
        "rsids": ";".join(v.get("rsids") or []),
        "consequence": v.get("consequence"),
        "lof": v.get("lof"),
        "af_genome": af,
        "ac": g.get("ac"),
        "an": g.get("an"),
        "faf95_popmax": g.get("faf95", {}).get("popmax"),
        "faf95_pop": g.get("faf95", {}).get("popmax_population"),
    })

df = pd.DataFrame(rows)
df = df.sort_values("af_genome")
df.to_csv(f"{gene}_rare_variants.csv", index=False)
print(f"{gene}: {len(variants)} total variants, {len(df)} rare (AF<1%)")
print(df.groupby("consequence")["variant_id"].count().sort_values(ascending=False).head(6))
# LDLR: 2847 total variants, 2631 rare (AF<1%)
# consequence
# missense_variant              1423
# synonymous_variant             512
# splice_region_variant          231
# stop_gained                    198

Workflow 2: Ancestry-Stratified Frequency Visualization

Goal: Query a list of variants and produce a barplot of allele frequencies by ancestry group.

import requests, time
import pandas as pd
import matplotlib.pyplot as plt

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(result["errors"])
    return result["data"]

POPULATION_FREQ_QUERY = """
query PopFreqs($variantId: String!, $dataset: DatasetId!) {
  variant(variantId: $variantId, dataset: $dataset) {
    variant_id
    genome {
      populations { id ac an homozygote_count }
    }
  }
}
"""

ANCESTRY_LABELS = {
    "afr": "AFR", "amr": "AMR", "eas": "EAS", "fin": "FIN",
    "nfe": "NFE", "sas": "SAS", "asj": "ASJ", "mid": "MID",
}

variant_id = "1-55039974-G-T"   # PCSK9 p.Tyr142Ter
data = gnomad_query(POPULATION_FREQ_QUERY, {
    "variantId": variant_id,
    "dataset": "gnomad_r4"
})

pops = data["variant"]["genome"]["populations"]
rows = [{"code": p["id"], "af": p["ac"] / p["an"], "ac": p["ac"], "an": p["an"]}
        for p in pops if p["id"] in ANCESTRY_LABELS and p["an"] > 0]
df = pd.DataFrame(rows)
df["label"] = df["code"].map(ANCESTRY_LABELS)
df = df.sort_values("af", ascending=False)

fig, ax = plt.subplots(figsize=(9, 4))
bars = ax.bar(df["label"], df["af"] * 100, color="#4472C4", edgecolor="white")
ax.bar_label(bars, fmt="%.3f%%", fontsize=8, padding=2)
ax.set_xlabel("Ancestry Group")
ax.set_ylabel("Allele Frequency (%)")
ax.set_title(f"gnomAD v4 Population Frequencies\n{variant_id}")
ax.set_ylim(0, df["af"].max() * 150)
plt.tight_layout()
plt.savefig("gnomad_pop_frequencies.png", dpi=150, bbox_inches="tight")
print(f"Saved gnomad_pop_frequencies.png  (n={len(df)} ancestry groups)")
print(df[["label", "af", "ac", "an"]].to_string(index=False))

Workflow 3: Constraint-Guided Gene Prioritization

Goal: Score a gene list by constraint metrics and flag LoF-intolerant genes.

import requests, time, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(result["errors"])
    return result["data"]

CONSTRAINT_QUERY = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id symbol
    gnomad_constraint {
      pli mis_z lof_z
      obs_lof exp_lof oe_lof oe_lof_lower oe_lof_upper
    }
  }
}
"""

# GnomadConstraint is flat (no nested `lof` type); `pNull` / `pRec` are gone;
# `gene_name` is gone — use `symbol`. Filter LoF-intolerant by LOEUF < 0.35
# (the gnomAD recommendation), which is more robust than pLI > 0.9.
gene_list = ["BRCA1", "BRCA2", "PCSK9", "LDLR", "TTN", "CFTR", "HTT"]
records = []
for gene in gene_list:
    try:
        data = gnomad_query(CONSTRAINT_QUERY, {"gene_symbol": gene, "reference_genome": "GRCh38"})
        c = data["gene"]["gnomad_constraint"]
        records.append({
            "gene": gene,
            "pli": c["pli"],
            "LOEUF": c["oe_lof_upper"],
            "mis_z": c["mis_z"],
            "lof_obs": c["obs_lof"],
            "lof_exp": c["exp_lof"],
            "lof_oe":  c["oe_lof"],
        })
    except Exception as e:
        print(f"Warning: {gene} failed — {e}")
    time.sleep(0.5)

df = pd.DataFrame(records).sort_values("LOEUF")
df["lof_intolerant"] = df["LOEUF"] < 0.35
print(df[["gene", "pli", "LOEUF", "mis_z", "lof_intolerant"]].to_string(index=False))
df.to_csv("constraint_scores.csv", index=False)
print(f"\nLoF-intolerant genes (LOEUF < 0.35): {df['lof_intolerant'].sum()}/{len(df)}")

Key Parameters

Parameter Function/Endpoint Default Range / Options Effect
dataset All variant queries gnomad_r4, gnomad_r2_1, gnomad_r3 Dataset version (GRCh38 for r4/r3, GRCh37 for r2_1)
reference_genome gene(), region() GRCh38, GRCh37 Coordinate system; must match dataset
variant_id variant() CHROM-POS-REF-ALT string Identifies the specific variant to query
gene_symbol gene() HGNC symbol string Gene to retrieve; case-insensitive
chrom, start, stop region() valid genomic coordinates Region boundaries for region queries
faf95.popmax variant() genome float 0–1 Filtering allele frequency (95% CI upper bound); use < 0.001 for rare
lof filter field gene() variants "HC" (high-confidence), "LC" LoF confidence level
populations.id genome.populations afr, amr, eas, fin, nfe, sas, asj, mid, oth Per-ancestry frequency

Best Practices

  1. Use gnomad_r4 for GRCh38 analyses: gnomAD v4 is the most current dataset with 730K+ individuals. Use gnomad_r2_1 only when comparing to GRCh37-based variant calls.

  2. Use faf95.popmax for clinical filtering, not overall AF: The filtering allele frequency accounts for maximum population stratification and provides a more conservative rarity estimate than the global AF.

  3. Add time.sleep(0.5) in batch loops: gnomAD has no published rate limits but the API is shared infrastructure. Polite delays prevent server-side throttling.

  4. Filter lof == "HC" for LoF burden analyses: Low-confidence LoF ("LC") annotations are often in repetitive regions or may be sequencing artifacts. High-confidence ("HC") calls are filtered by LOFTEE.

  5. Check AN before interpreting AF: Low allele number (AN) means poor coverage in that population. A zero or near-zero AF may reflect absent data, not true rarity. Cross-reference with the coverage query when AN is unexpectedly low.

Common Recipes

Recipe: Check if a Variant Is Common in Any Population

When to use: Quick check before clinical interpretation — confirm no ancestry group has AF > 1%.

import requests

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def is_common_in_any_population(variant_id, threshold=0.01, dataset="gnomad_r4"):
    query = """
    query($variantId: String!, $dataset: DatasetId!) {
      variant(variantId: $variantId, dataset: $dataset) {
        genome { faf95 { popmax popmax_population } af }
      }
    }
    """
    r = requests.post(GNOMAD_API, json={"query": query,
                                         "variables": {"variantId": variant_id, "dataset": dataset}},
                      timeout=15)
    data = r.json()["data"]["variant"]
    if not data or not data["genome"]:
        return None, "Variant not found in gnomAD"
    af = data["genome"]["af"]
    popmax = data["genome"]["faf95"]["popmax"]
    pop = data["genome"]["faf95"]["popmax_population"]
    is_common = (popmax or 0) >= threshold
    return is_common, f"overall AF={af:.2e}, FAF95 popmax={popmax:.2e} in {pop}"

common, info = is_common_in_any_population("1-55039974-G-T")
print(f"Common: {common}  |  {info}")
# Common: False  |  overall AF=3.2e-05, FAF95 popmax=6.4e-05 in nfe

Recipe: Batch Constraint Lookup

When to use: Score multiple genes from a differential expression or GWAS gene list.

import requests, time, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def get_constraint(gene_symbol, reference_genome="GRCh38"):
    query = """
    query($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
      gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
        gnomad_constraint { pli mis_z oe_lof_upper }
      }
    }
    """
    r = requests.post(GNOMAD_API, json={"query": query,
                      "variables": {"gene_symbol": gene_symbol, "reference_genome": reference_genome}},
                      timeout=15)
    data = r.json().get("data", {}).get("gene", {})
    if not data or not data.get("gnomad_constraint"):
        return None
    c = data["gnomad_constraint"]
    return {"gene": gene_symbol, "pli": c["pli"], "LOEUF": c["oe_lof_upper"], "mis_z": c["mis_z"]}

genes = ["BRCA1", "BRCA2", "ATM", "CHEK2", "PALB2"]
rows = [r for g in genes for r in [get_constraint(g)] if r]
time.sleep(0.5)   # polite delay per gene in real loop

df = pd.DataFrame(rows)
print(df.to_string(index=False))
# gene        pli    LOEUF  mis_z
# BRCA1  5.52e-38   0.928   1.73
# BRCA2  6.34e-09   0.521   0.97

Recipe: Export LoF Variants for CADD/ClinVar Cross-Reference

When to use: Get high-confidence LoF variants from gnomAD for downstream annotation.

import requests, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def get_lof_variants(gene_symbol, dataset="gnomad_r4", max_af=0.001):
    query = """
    query($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
      gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
        variants(dataset: $dataset) {
          variant_id rsids chrom pos ref alt consequence lof
          genome { af ac an }
        }
      }
    }
    """
    r = requests.post(GNOMAD_API, json={"query": query,
                      "variables": {"gene_symbol": gene_symbol, "reference_genome": "GRCh38", "dataset": dataset}},
                      timeout=60)
    variants = r.json()["data"]["gene"]["variants"]
    lof = [v for v in variants
           if v.get("lof") == "HC"
           and v.get("genome") and v["genome"].get("af") is not None
           and v["genome"]["af"] < max_af]
    return pd.DataFrame([{
        "variant_id": v["variant_id"],
        "rsids": ";".join(v.get("rsids") or []),
        "consequence": v["consequence"],
        "af": v["genome"]["af"],
        "ac": v["genome"]["ac"],
    } for v in lof])

df = get_lof_variants("CFTR", max_af=0.001)
print(f"High-confidence LoF variants in CFTR (AF<0.1%): {len(df)}")
print(df.head(5).to_string(index=False))
df.to_csv("CFTR_HC_lof_variants.csv", index=False)

Troubleshooting

Problem Cause Solution
{"errors": [...]} from GraphQL Invalid field name, wrong dataset ID, or null gene Check field names match gnomAD v4 schema; use gnomad_r4 not gnomad_v4
Variant returns None genome object Variant only in exome data, not genome Try accessing exome field instead of genome; genome is absent for exome-only variants
Gene query returns empty variants list Gene symbol not found or mismatch Verify HGNC symbol (case-sensitive); use gene_id (ENSG ID) as fallback
faf95 returns null Variant is absent or monomorphic in all populations Check ac and an — variant may have AC=0 or be filtered
requests.exceptions.Timeout Large gene (e.g., TTN) takes >30s Increase timeout=120; for very large genes use region queries instead
Population AF is None for some groups Variant not observed in that ancestry Treat None AF as 0 for filtering; check an to confirm the group was sequenced
reference_genome mismatch error Using GRCh37 coords with gnomad_r4 Use GRCh38 for gnomad_r4/gnomad_r3; use GRCh37 only for gnomad_r2_1

Related Skills

  • clinvar-database — ClinVar pathogenicity classifications (complement to gnomAD population frequency data)
  • gwas-database — GWAS Catalog for SNP-trait associations from published GWAS studies
  • ensembl-database — Ensembl VEP for variant consequence prediction and gene annotation
  • dbsnp-database — dbSNP for rsID lookup, variant classes, and cross-database ID mapping

References

Install via CLI
npx skills add https://github.com/BioTender-max/awesome-bio-agent-skills --skill gnomad-database
Repository Details
star Stars 54
call_split Forks 14
navigation Branch main
article Path SKILL.md
More from Creator
BioTender-max
BioTender-max Explore all skills →