bio-entrez-fetch

star 30

Retrieve records from NCBI databases using Biopython Bio.Entrez. Use when downloading sequences, fetching GenBank records, getting document summaries, or parsing NCBI data into Biopython objects.

mdbabumiamssm By mdbabumiamssm schedule Updated 2/4/2026

name: bio-entrez-fetch description: Retrieve records from NCBI databases using Biopython Bio.Entrez. Use when downloading sequences, fetching GenBank records, getting document summaries, or parsing NCBI data into Biopython objects. tool_type: python primary_tool: Bio.Entrez measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools: - read_file - run_shell_command

Entrez Fetch

Retrieve records from NCBI databases using Biopython's Entrez module (EFetch, ESummary utilities).

Required Setup

from Bio import Entrez

Entrez.email = 'your.email@example.com'  # Required by NCBI
Entrez.api_key = 'your_api_key'          # Optional, raises rate limit 3->10 req/sec

Core Functions

Entrez.efetch() - Retrieve Full Records

Fetch complete records in various formats from any NCBI database.

# Fetch GenBank record by ID
handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='gb', retmode='text')
genbank_text = handle.read()
handle.close()

# Fetch FASTA sequence
handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='fasta', retmode='text')
fasta_text = handle.read()
handle.close()

# Fetch multiple records
handle = Entrez.efetch(db='nucleotide', id='NM_007294,NM_000059', rettype='fasta', retmode='text')

Key Parameters:

Parameter Description Example
db Database name 'nucleotide', 'protein', 'pubmed'
id Record ID(s) 'NM_007294' or '123,456,789'
rettype Return type 'fasta', 'gb', 'abstract'
retmode Return mode 'text', 'xml'
retstart Start index 0
retmax Max records 20
WebEnv History server session From esearch
query_key History server query From esearch

Common Return Types by Database

Nucleotide/Protein:

rettype retmode Description
'fasta' 'text' FASTA sequence
'gb' 'text' GenBank flat file
'gp' 'text' GenPept flat file (protein)
'gbwithparts' 'text' GenBank with contig sequences
'seqid' 'text' Seq-id only
'acc' 'text' Accession only

PubMed:

rettype retmode Description
'abstract' 'text' Abstract text
'medline' 'text' MEDLINE format
'xml' 'xml' Full PubMed XML

Gene:

rettype retmode Description
'gene_table' 'text' Gene table format
'xml' 'xml' Full gene XML

Entrez.esummary() - Document Summaries

Get brief summaries without downloading full records. Faster than efetch.

# Get summary for nucleotide record
handle = Entrez.esummary(db='nucleotide', id='NM_007294')
record = Entrez.read(handle)
handle.close()

summary = record[0]  # First (only) record
print(f"Title: {summary['Title']}")
print(f"Length: {summary['Length']}")
print(f"Organism: {summary['Organism']}")

Common Summary Fields:

# Nucleotide/Protein
summary['Title']          # Record title/description
summary['Caption']        # Short identifier
summary['Length']         # Sequence length
summary['Organism']       # Source organism
summary['TaxId']          # Taxonomy ID
summary['AccessionVersion']  # Full accession.version

# PubMed
summary['Title']          # Article title
summary['AuthorList']     # Authors
summary['Source']         # Journal
summary['PubDate']        # Publication date
summary['DOI']            # Digital Object Identifier

Parsing with Biopython

Parse into SeqRecord Objects

from Bio import Entrez, SeqIO

Entrez.email = 'your.email@example.com'

# Parse GenBank into SeqRecord
handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='gb', retmode='text')
record = SeqIO.read(handle, 'genbank')
handle.close()

print(f"ID: {record.id}")
print(f"Length: {len(record.seq)}")
print(f"Features: {len(record.features)}")

# Parse FASTA into SeqRecord
handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='fasta', retmode='text')
record = SeqIO.read(handle, 'fasta')
handle.close()

Parse Multiple Records

# Fetch multiple as FASTA
handle = Entrez.efetch(db='nucleotide', id='NM_007294,NM_000059,NM_000546', rettype='fasta', retmode='text')
records = list(SeqIO.parse(handle, 'fasta'))
handle.close()

for record in records:
    print(f"{record.id}: {len(record.seq)} bp")

Parse XML with Entrez.read()

# For structured data, use XML mode
handle = Entrez.efetch(db='gene', id='672', retmode='xml')
records = Entrez.read(handle)
handle.close()

# Navigate nested structure
gene = records[0]
print(f"Gene: {gene['Entrezgene_gene']['Gene-ref']['Gene-ref_locus']}")

Code Patterns

Fetch Sequence by Accession

from Bio import Entrez, SeqIO

Entrez.email = 'your.email@example.com'

def fetch_sequence(accession, db='nucleotide'):
    handle = Entrez.efetch(db=db, id=accession, rettype='fasta', retmode='text')
    record = SeqIO.read(handle, 'fasta')
    handle.close()
    return record

seq = fetch_sequence('NM_007294')
print(f"{seq.id}: {seq.seq[:50]}...")

Fetch GenBank with Features

def fetch_genbank(accession):
    handle = Entrez.efetch(db='nucleotide', id=accession, rettype='gb', retmode='text')
    record = SeqIO.read(handle, 'genbank')
    handle.close()
    return record

gb = fetch_genbank('NM_007294')
for feature in gb.features:
    if feature.type == 'CDS':
        print(f"CDS: {feature.location}")
        print(f"Product: {feature.qualifiers.get('product', ['?'])[0]}")

Fetch PubMed Abstract

def fetch_abstract(pmid):
    handle = Entrez.efetch(db='pubmed', id=pmid, rettype='abstract', retmode='text')
    abstract = handle.read()
    handle.close()
    return abstract

abstract = fetch_abstract('35412348')
print(abstract)

Get Record Summaries

def get_summaries(db, ids):
    if isinstance(ids, list):
        ids = ','.join(ids)
    handle = Entrez.esummary(db=db, id=ids)
    records = Entrez.read(handle)
    handle.close()
    return records

summaries = get_summaries('nucleotide', ['NM_007294', 'NM_000059'])
for s in summaries:
    print(f"{s['Caption']}: {s['Title'][:50]}... ({s['Length']} bp)")

Search Then Fetch

# Search for records
handle = Entrez.esearch(db='nucleotide', term='human[orgn] AND insulin[gene] AND mRNA[fkey]', retmax=5)
search_results = Entrez.read(handle)
handle.close()

ids = search_results['IdList']

# Fetch the sequences
handle = Entrez.efetch(db='nucleotide', id=','.join(ids), rettype='fasta', retmode='text')
records = list(SeqIO.parse(handle, 'fasta'))
handle.close()

for record in records:
    print(f"{record.id}: {len(record.seq)} bp")

Fetch Protein by Gene ID

# Search gene database
handle = Entrez.esearch(db='gene', term='BRCA1[sym] AND human[orgn]')
result = Entrez.read(handle)
handle.close()
gene_id = result['IdList'][0]

# Get linked protein IDs
handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id)
links = Entrez.read(handle)
handle.close()

protein_ids = [link['Id'] for link in links[0]['LinkSetDb'][0]['Link'][:3]]

# Fetch proteins
handle = Entrez.efetch(db='protein', id=','.join(protein_ids), rettype='fasta', retmode='text')
proteins = list(SeqIO.parse(handle, 'fasta'))
handle.close()

Save Fetched Records to File

def download_sequences(ids, output_file, db='nucleotide', format='fasta'):
    handle = Entrez.efetch(db=db, id=','.join(ids), rettype=format, retmode='text')
    with open(output_file, 'w') as out:
        out.write(handle.read())
    handle.close()

download_sequences(['NM_007294', 'NM_000059'], 'brca_genes.fasta')

Common Errors

Error Cause Solution
HTTPError 400 Invalid ID or parameters Verify ID exists, check rettype
HTTPError 429 Rate limit exceeded Add delays or use API key
Empty result Record doesn't exist Verify accession in web browser
ValueError in SeqIO Wrong format specified Match rettype with SeqIO format
ExpatError XML parsing error Use retmode='text' instead

Decision Tree

Need to retrieve NCBI records?
├── Need full sequence?
│   └── Use efetch with rettype='fasta'
├── Need sequence + annotations?
│   └── Use efetch with rettype='gb' (GenBank)
├── Just need metadata (length, organism)?
│   └── Use esummary (faster)
├── Need PubMed abstract?
│   └── Use efetch with rettype='abstract'
├── Need structured data for parsing?
│   └── Use efetch with retmode='xml' + Entrez.read()
├── Downloading many records?
│   └── See batch-downloads skill
└── Need records from multiple databases?
    └── See entrez-link skill first

Related Skills

  • entrez-search - Find record IDs before fetching
  • entrez-link - Find related records in other databases
  • batch-downloads - Download large numbers of records efficiently
  • sequence-io/read-sequences - Parse downloaded sequences with SeqIO
Install via CLI
npx skills add https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- --skill bio-entrez-fetch
Repository Details
star Stars 30
call_split Forks 7
navigation Branch main
article Path SKILL.md
More from Creator
mdbabumiamssm
mdbabumiamssm Explore all skills →