name: bio-entrez-link
description: Find cross-references between NCBI databases using Biopython Bio.Entrez. Use when navigating from genes to proteins, sequences to publications, finding related records, or discovering database relationships.
tool_type: python
primary_tool: Bio.Entrez
measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes.
allowed-tools:
- read_file
- run_shell_command
Entrez Link
Navigate between NCBI databases using Biopython's Entrez module (ELink utility).
Required Setup
from Bio import Entrez
Entrez.email = 'your.email@example.com' # Required by NCBI
Entrez.api_key = 'your_api_key' # Optional, raises rate limit
Core Function
Entrez.elink() - Cross-Database Links
Find related records in the same or different databases.
# Find proteins linked to a gene
handle = Entrez.elink(dbfrom='gene', db='protein', id='672')
record = Entrez.read(handle)
handle.close()
# Extract linked IDs
linkset = record[0]
if linkset['LinkSetDb']:
links = linkset['LinkSetDb'][0]['Link']
protein_ids = [link['Id'] for link in links]
print(f"Found {len(protein_ids)} linked proteins")
Key Parameters:
| Parameter |
Description |
Example |
dbfrom |
Source database |
'gene' |
db |
Target database |
'protein' |
id |
Source record ID(s) |
'672' or '672,675' |
linkname |
Specific link type |
'gene_protein_refseq' |
cmd |
Link command |
'neighbor', 'neighbor_score' |
ELink Result Structure
record[0] # First linkset
record[0]['DbFrom'] # Source database
record[0]['IdList'] # Input IDs
record[0]['LinkSetDb'] # List of link results
record[0]['LinkSetDb'][0]['DbTo'] # Target database
record[0]['LinkSetDb'][0]['LinkName'] # Link name
record[0]['LinkSetDb'][0]['Link'] # List of linked records
record[0]['LinkSetDb'][0]['Link'][0]['Id'] # Linked ID
Common Link Paths
Gene to Other Databases
| From |
To |
Link Name |
Description |
| gene |
protein |
gene_protein |
All proteins |
| gene |
protein |
gene_protein_refseq |
RefSeq proteins only |
| gene |
nucleotide |
gene_nuccore |
Nucleotide sequences |
| gene |
nucleotide |
gene_nuccore_refseqrna |
RefSeq mRNA |
| gene |
pubmed |
gene_pubmed |
Related publications |
| gene |
homologene |
gene_homologene |
Homologs |
| gene |
snp |
gene_snp |
SNPs in gene |
| gene |
clinvar |
gene_clinvar |
Clinical variants |
Nucleotide to Other Databases
| From |
To |
Link Name |
Description |
| nucleotide |
protein |
nuccore_protein |
Encoded proteins |
| nucleotide |
gene |
nuccore_gene |
Gene records |
| nucleotide |
pubmed |
nuccore_pubmed |
Publications |
| nucleotide |
taxonomy |
nuccore_taxonomy |
Organism taxonomy |
| nucleotide |
biosample |
nuccore_biosample |
Sample info |
| nucleotide |
sra |
nuccore_sra |
Related SRA data |
Protein to Other Databases
| From |
To |
Link Name |
Description |
| protein |
nucleotide |
protein_nuccore |
Coding sequences |
| protein |
gene |
protein_gene |
Gene records |
| protein |
pubmed |
protein_pubmed |
Publications |
| protein |
structure |
protein_structure |
3D structures |
| protein |
cdd |
protein_cdd |
Conserved domains |
PubMed Links
| From |
To |
Link Name |
Description |
| pubmed |
pubmed |
pubmed_pubmed |
Related articles |
| pubmed |
gene |
pubmed_gene |
Mentioned genes |
| pubmed |
protein |
pubmed_protein |
Mentioned proteins |
| pubmed |
nucleotide |
pubmed_nuccore |
Mentioned sequences |
Code Patterns
Gene to Protein
from Bio import Entrez
Entrez.email = 'your.email@example.com'
def get_proteins_for_gene(gene_id):
handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq')
record = Entrez.read(handle)
handle.close()
if not record[0]['LinkSetDb']:
return []
return [link['Id'] for link in record[0]['LinkSetDb'][0]['Link']]
protein_ids = get_proteins_for_gene('672') # BRCA1
print(f"RefSeq proteins: {protein_ids[:5]}")
Nucleotide to Gene
def get_gene_for_nucleotide(nuc_id):
handle = Entrez.elink(dbfrom='nucleotide', db='gene', id=nuc_id)
record = Entrez.read(handle)
handle.close()
if not record[0]['LinkSetDb']:
return None
return record[0]['LinkSetDb'][0]['Link'][0]['Id']
gene_id = get_gene_for_nucleotide('NM_007294')
print(f"Gene ID: {gene_id}")
Find Related PubMed Articles
def get_related_articles(pmid, max_results=10):
handle = Entrez.elink(dbfrom='pubmed', db='pubmed', id=pmid, linkname='pubmed_pubmed')
record = Entrez.read(handle)
handle.close()
if not record[0]['LinkSetDb']:
return []
links = record[0]['LinkSetDb'][0]['Link']
return [link['Id'] for link in links[:max_results]]
related = get_related_articles('35412348')
print(f"Related articles: {related}")
Get All Available Links
def discover_links(db, record_id):
handle = Entrez.elink(dbfrom=db, id=record_id, cmd='acheck')
record = Entrez.read(handle)
handle.close()
links = {}
for linkset in record[0].get('LinkSetDb', []):
links[linkset['LinkName']] = linkset['DbTo']
return links
available = discover_links('gene', '672')
for name, target in available.items():
print(f"{name} -> {target}")
Navigate Gene -> Protein -> Structure
def gene_to_structures(gene_id):
# Gene to protein
handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq')
record = Entrez.read(handle)
handle.close()
if not record[0]['LinkSetDb']:
return []
protein_ids = [link['Id'] for link in record[0]['LinkSetDb'][0]['Link'][:5]]
# Protein to structure
handle = Entrez.elink(dbfrom='protein', db='structure', id=','.join(protein_ids))
record = Entrez.read(handle)
handle.close()
structure_ids = []
for linkset in record:
if linkset['LinkSetDb']:
structure_ids.extend([link['Id'] for link in linkset['LinkSetDb'][0]['Link']])
return structure_ids
structures = gene_to_structures('672')
print(f"Structure IDs: {structures[:5]}")
Link Multiple IDs at Once
def batch_link(dbfrom, db, ids):
if isinstance(ids, list):
ids = ','.join(ids)
handle = Entrez.elink(dbfrom=dbfrom, db=db, id=ids)
record = Entrez.read(handle)
handle.close()
# Returns one linkset per input ID
results = {}
for linkset in record:
source_id = linkset['IdList'][0]
linked_ids = []
if linkset['LinkSetDb']:
linked_ids = [link['Id'] for link in linkset['LinkSetDb'][0]['Link']]
results[source_id] = linked_ids
return results
results = batch_link('gene', 'protein', ['672', '675', '7157'])
for gene, proteins in results.items():
print(f"Gene {gene}: {len(proteins)} proteins")
Get Publications for a Sequence
def get_sequence_publications(accession):
# First get the GI/UID
handle = Entrez.esearch(db='nucleotide', term=f'{accession}[accn]')
search = Entrez.read(handle)
handle.close()
if not search['IdList']:
return []
uid = search['IdList'][0]
# Link to PubMed
handle = Entrez.elink(dbfrom='nucleotide', db='pubmed', id=uid)
record = Entrez.read(handle)
handle.close()
if not record[0]['LinkSetDb']:
return []
return [link['Id'] for link in record[0]['LinkSetDb'][0]['Link']]
pmids = get_sequence_publications('NM_007294')
print(f"PubMed IDs: {pmids[:5]}")
Link Commands
| Command |
Description |
neighbor |
Default - get linked records |
neighbor_score |
Include relevance scores |
neighbor_history |
Store results in history |
acheck |
List all available links |
ncheck |
Check if any links exist |
lcheck |
Check specific link exists |
llinks |
Get URLs to Entrez links |
prlinks |
Get provider links (external) |
Common Errors
| Error |
Cause |
Solution |
Empty LinkSetDb |
No links exist |
Check if record has linked data |
HTTPError 400 |
Invalid ID or database |
Verify ID exists in source database |
KeyError |
Missing expected field |
Check if LinkSetDb is empty first |
| Single linkset expected, got list |
Multiple input IDs |
Iterate through record list |
Decision Tree
Need to find related records?
├── Know what link you want?
│ └── Use elink with specific linkname
├── Discover what links exist?
│ └── Use elink with cmd='acheck'
├── Navigate to target database?
│ └── Use elink(dbfrom=X, db=Y, id=Z)
├── Find related records in same database?
│ └── Use elink(dbfrom=X, db=X) with neighbor
├── Chain multiple databases?
│ └── Call elink multiple times
└── Need the actual records?
└── Use elink first, then efetch with IDs
Related Skills
- entrez-search - Search databases before linking
- entrez-fetch - Retrieve records after finding linked IDs
- batch-downloads - Download many linked records efficiently