name: pdbe-api description: "Query the PDBe (Protein Data Bank in Europe) REST API and Solr search API from within Coot to access structure metadata, validation data, revision history, search capabilities, and download coordinate files"
PDBe API Access from Coot
Overview
The PDBe (Protein Data Bank in Europe) provides comprehensive REST and Solr-based APIs for programmatic access to structure data, validation reports, compound information, revision history, and search capabilities. Coot can access these APIs directly using the coot_get_url_as_string_py() function, which now supports both text and binary data.
Core Function
coot.coot_get_url_as_string_py(url) - Fetch URL content
Returns:
- Python
strfor text/JSON content (valid UTF-8) - Python
bytesfor binary content (gzipped files, images, etc.)
import json
# Example 1: Get structure summary (returns string)
result = coot.coot_get_url_as_string_py("https://www.ebi.ac.uk/pdbe/api/pdb/entry/summary/4wa9")
data = json.loads(result)
# Example 2: Download gzipped coordinates (returns bytes)
import gzip
compressed = coot.coot_get_url_as_string_py("https://files.rcsb.org/download/4wa9.cif.gz")
decompressed = gzip.decompress(compressed)
imol = coot.read_coordinates_as_string(decompressed.decode('utf-8'), "4wa9")
Main API Endpoints
Entry-based API
Base URL: https://www.ebi.ac.uk/pdbe/api/
Documentation: https://www.ebi.ac.uk/pdbe/api/doc/
Aggregated API
Base URL: https://www.ebi.ac.uk/pdbe/graph-api/
Documentation: https://pdbe.org/graph-api
Search API (Solr)
Base URL: https://www.ebi.ac.uk/pdbe/search/pdb/select?
Documentation: https://www.ebi.ac.uk/pdbe/api/doc/search.html
Common Query Patterns
1. Structure Summary and Metadata
Get basic information about a structure including deposition date, revision date, authors, and experimental method:
import json
pdb_id = "4wa9"
url = f"https://www.ebi.ac.uk/pdbe/api/pdb/entry/summary/{pdb_id}"
result = coot.coot_get_url_as_string_py(url)
# Handle both string and bytes responses
if isinstance(result, bytes):
result = result.decode('utf-8')
data = json.loads(result)
# Extract key information
entry = data[pdb_id][0]
print(f"Title: {entry['title']}")
print(f"Release date: {entry['release_date']}")
print(f"Revision date: {entry['revision_date']}")
print(f"Method: {entry['experimental_method']}")
print(f"Authors: {entry['entry_authors']}")
Key fields in response:
title- Structure titlerelease_date- Original deposition date (YYYYMMDD)revision_date- Most recent revision date (YYYYMMDD)experimental_method- List of experimental methodsentry_authors- List of authorsnumber_of_entities- Count of different entity types (protein, ligand, water, etc.)
2. Molecule/Entity Information
Get organism and molecule details:
import json
pdb_id = "4wa9"
url = f"https://www.ebi.ac.uk/pdbe/api/pdb/entry/molecules/{pdb_id}"
result = coot.coot_get_url_as_string_py(url)
if isinstance(result, bytes):
result = result.decode('utf-8')
data = json.loads(result)
if pdb_id in data:
for entity in data[pdb_id]:
molecule_name = entity.get('molecule_name', ['N/A'])[0] if isinstance(entity.get('molecule_name', []), list) else entity.get('molecule_name', 'N/A')
source = entity.get('source', [{}])[0] if isinstance(entity.get('source', []), list) else entity.get('source', {})
organism = source.get('organism_scientific_name', 'N/A')
expression_host = source.get('expression_host_scientific_name', 'N/A')
print(f"Molecule: {molecule_name}")
print(f" Source organism: {organism}")
if expression_host != 'N/A' and expression_host != organism:
print(f" Expression host: {expression_host}")
3. Compound/Ligand Information
Get detailed information about a specific compound including formula, SMILES, InChI, and revision history:
import json
comp_id = "AXI" # 3-letter code
url = f"https://www.ebi.ac.uk/pdbe/api/pdb/compound/summary/{comp_id}"
result = coot.coot_get_url_as_string_py(url)
if isinstance(result, bytes):
result = result.decode('utf-8')
data = json.loads(result)
compound = data[comp_id][0]
print(f"Name: {compound['name']}")
print(f"Formula: {compound['formula']}")
print(f"Weight: {compound['weight']}")
print(f"Creation date: {compound['creation_date']}")
print(f"Revision date: {compound['revision_date']}")
print(f"InChI: {compound['inchi']}")
print(f"SMILES: {compound['smiles'][0]['name']}")
Use case: Check if a ligand definition was recently revised, which might explain geometry changes.
4. Downloading Coordinate Files
Download and load PDB/mmCIF files directly:
import gzip
# Download current version from RCSB
pdb_id = "4wa9"
url = f"https://files.rcsb.org/download/{pdb_id}.cif.gz"
print(f"Downloading {pdb_id}...")
compressed = coot.coot_get_url_as_string_py(url)
print(f"Downloaded {len(compressed)} bytes (compressed)")
# Decompress
decompressed = gzip.decompress(compressed)
print(f"Decompressed to {len(decompressed)} bytes")
# Load into Coot
imol = coot.read_coordinates_as_string(decompressed.decode('utf-8'), f"{pdb_id}")
print(f"Loaded as molecule {imol}")
Note: The wwPDB versioned archive exists but is not currently accessible via HTTPS through this API. Use the current version from RCSB or PDBe.
5. Validation Reports
Get residue-wise outliers including clashes, geometry outliers, and density fit issues:
import json
pdb_id = "4wa9"
url = f"https://www.ebi.ac.uk/pdbe/api/validation/residuewise_outlier_summary/entry/{pdb_id}"
result = coot.coot_get_url_as_string_py(url)
if isinstance(result, bytes):
result = result.decode('utf-8')
# Note: Response may have multiple JSON objects, parse carefully
# Access specific chain/residue validation data
Available validation endpoints:
/validation/residuewise_outlier_summary/entry/{pdb_id}- Residue-level outliers/validation/rama_sidechain_listing/entry/{pdb_id}- Ramachandran and rotamer outliers/validation/global_percentiles/entry/{pdb_id}- Overall quality metrics
6. Structure Status and Revision History
Check if a structure has been superseded or revised:
import json
pdb_id = "4wa9"
url = f"https://www.ebi.ac.uk/pdbe/api/pdb/entry/status/{pdb_id}"
result = coot.coot_get_url_as_string_py(url)
if isinstance(result, bytes):
result = result.decode('utf-8')
data = json.loads(result)
status = data[pdb_id][0]
print(f"Status: {status['status_code']}") # REL = released, OBS = obsolete
print(f"Since: {status['since']}")
print(f"Superseded by: {status['superceded_by']}")
print(f"Obsoletes: {status['obsoletes']}")
7. Ligand Binding Sites
Get information about ligand binding sites and interactions:
import json
pdb_id = "4wa9"
url = f"https://www.ebi.ac.uk/pdbe/api/pdb/entry/ligand_monomers/{pdb_id}"
result = coot.coot_get_url_as_string_py(url)
if isinstance(result, bytes):
result = result.decode('utf-8')
data = json.loads(result)
# Access ligand information per chain
for entity in data[pdb_id]:
print(f"Chain: {entity['chain_id']}")
for ligand in entity.get('ligands', []):
print(f" Ligand: {ligand['chem_comp_id']}")
print(f" Residue: {ligand['author_residue_number']}")
8. Assembly Information
Get biological assembly information:
import json
pdb_id = "2hyy"
url = f"https://www.ebi.ac.uk/pdbe/api/pdb/entry/assembly/{pdb_id}"
result = coot.coot_get_url_as_string_py(url)
if isinstance(result, bytes):
result = result.decode('utf-8')
data = json.loads(result)
for assembly in data[pdb_id]:
print(f"Assembly {assembly['assembly_id']}: {assembly['name']}")
print(f" Form: {assembly['form']}")
print(f" Preferred: {assembly['preferred']}")
Solr Search API
The Solr search API allows complex queries across the entire PDB. However, it has important limitations.
What Solr Search CAN Do (Well-Indexed Fields)
✅ Metadata searches:
- By release/deposition date:
release_year:2025 - By experimental method:
experimental_method:"X-ray diffraction" - By resolution:
resolution:[* TO 2.0]orresolution:[1.5 TO 2.5] - By organism:
organism_scientific_name:"Homo sapiens"
✅ Presence/absence queries:
- Has protein:
number_of_protein_chains:[1 TO *] - Has carbohydrate:
has_carb_polymer:Y - Has bound molecules:
has_bound_molecule:Y - Has modified residues:
has_modified_residues:Y
✅ Component searches:
- Specific ligand:
chem_comp_id:ATP - Ligand name:
ligand_name:imatinib - Molecule name:
molecule_name:*kinase*
✅ Author/citation:
- By author:
entry_authors:"Smith J" - By UniProt:
uniprot_accession:P12345
✅ Combined queries:
# Example: Human kinases with resolution < 2Å from 2024
query = 'release_year:2024 AND organism_scientific_name:"Homo sapiens" AND molecule_name:*kinase* AND resolution:[* TO 2.0]'
What Solr Search CANNOT Do
❌ Detailed connectivity: Cannot search for "THR covalently bonded to NAG" or other specific atom-level connections
❌ Geometry queries: Cannot search for "bonds longer than X" or "angles outside range Y"
❌ Spatial relationships: Cannot search for "atoms within 5Å of ligand"
❌ Sequence motifs: Cannot search for "structures with GXGXXG motif"
❌ Complex structural features: Cannot search for "beta-barrel with 8 strands"
❌ Validation specifics: Cannot search for "residues with Ramachandran outliers at position X"
The Pattern: Solr indexes metadata and simple categorical data, not structural details or relationships.
For analyses requiring connectivity or geometry (like finding O-glycosylated threonines), you must:
- Use Solr to find candidates (e.g., structures with NAG + resolution < 2.5Å)
- Download those structures
- Parse mmCIF connectivity tables locally
- Extract geometric parameters
Basic Search Syntax
import json
# Simple search for high-resolution X-ray structures from 2024
query = "release_year:2024 AND experimental_method:\"X-ray diffraction\" AND resolution:[* TO 1.5]"
url = f"https://www.ebi.ac.uk/pdbe/search/pdb/select?q={query}&wt=json&rows=10&fl=pdb_id,title,resolution"
result = coot.coot_get_url_as_string_py(url)
if isinstance(result, bytes):
result = result.decode('utf-8')
data = json.loads(result)
print(f"Found {data['response']['numFound']} structures")
for doc in data['response']['docs']:
print(f" {doc['pdb_id']}: {doc.get('resolution', 'N/A')}Å")
print(f" {doc.get('title', 'N/A')[:70]}")
Common Solr Search Fields
Identifiers & Metadata:
pdb_id- PDB entry IDmolecule_name- Molecule namemolecule_type- Entity type: use"Protein"(capital P) — NOTpolypeptide(L)orproteinmolecule_sequence- One-letter sequence string (stored but not full-text indexed — wildcard search like*C*C*returns 0 results; fetch and filter in Python instead)polymer_length- Length of the polymer entity in residues (supports range queries:[5 TO 30])number_of_polymer_residues- Total residues across all chains in the entrynumber_of_protein_chains- Number of protein chainsorganism_scientific_name- Source organismexperimental_method- Experimental method (e.g.,"X-ray diffraction")resolution- Structure resolutionligand_name- Ligand/compound namecitation_title- Publication titledeposition_date- Deposition daterevision_date- Revision date
Experimental Details:
experimental_method- Method (e.g., "X-ray diffraction", "Electron Microscopy", "Solution NMR")resolution- Structure resolution (numeric, use ranges like[1.0 TO 2.0])em_resolution- EM-specific resolutiondata_quality- Overall quality metric
Molecular Content:
molecule_name- Molecule name (supports wildcards:*kinase*)molecule_type- Type (Protein, DNA, RNA, etc.)organism_scientific_name- Source organismorganism_synonyms- Alternative organism namesgenus- Organism genusexpression_host_scientific_name- Expression system
Ligands & Modifications:
chem_comp_id- Chemical component 3-letter codeligand_name- Ligand namehas_bound_molecule- Y/Nhas_carb_polymer- Y/N (has carbohydrate)has_modified_residues- Y/Nnumber_of_bound_molecules- Count
Authors & Citations:
entry_authors- Entry authorscitation_authors- Publication authorscitation_title- Paper titlecitation_year- Publication yearpubmed_id- PubMed ID
Protein Details:
uniprot_accession- UniProt accessionuniprot_id- UniProt IDgene_name- Gene namego_id- Gene Ontology ID
Structure Properties:
number_of_protein_chains- Countnumber_of_polymer_entities- Countassembly_composition- Assembly typesymmetry_group- Symmetry
Practical Search Examples
Example 1: High-resolution X-ray structures from 2024
import json
url = "https://www.ebi.ac.uk/pdbe/search/pdb/select?q=release_year:2024%20AND%20experimental_method:\"X-ray%20diffraction\"%20AND%20resolution:[*%20TO%201.5]&wt=json&rows=5&fl=pdb_id,title,resolution"
result = coot.coot_get_url_as_string_py(url)
if isinstance(result, bytes):
result = result.decode('utf-8')
Sorting by `sort=molecular_weight+asc` gives a 400 error. Use `polymer_length` instead
as a proxy for size, or use `number_of_polymer_residues` for entry-level size.
**5. Use `fq` (filter query) for range constraints**
Range filtering on numeric fields works well as a filter query:
```python
# Filter to entities with 5-30 residues:
url = "...&q=molecule_type:Protein&fq=polymer_length:[5+TO+30]&sort=polymer_length+asc..."
Example 2: Human kinase structures
query = "organism_scientific_name:\"Homo sapiens\" AND molecule_name:*kinase*"
url = f"https://www.ebi.ac.uk/pdbe/search/pdb/select?q={query}&wt=json&rows=10&fl=pdb_id,title,resolution"
Example 3: Cryo-EM structures better than 3Å from 2025
query = "release_year:2025 AND experimental_method:\"Electron Microscopy\" AND resolution:[* TO 3.0]"
url = f"https://www.ebi.ac.uk/pdbe/search/pdb/select?q={query}&wt=json&rows=10&fl=pdb_id,title,em_resolution"
Example 4: Structures with carbohydrates
query = "has_carb_polymer:Y"
url = f"https://www.ebi.ac.uk/pdbe/search/pdb/select?q={query}&wt=json&rows=10&fl=pdb_id,title"
Example 5: Structures of a specific protein from different species
# Find ABL1 structures from different mammals
query = "molecule_name:*ABL1* OR molecule_name:*ABL*kinase*"
url = f"https://www.ebi.ac.uk/pdbe/search/pdb/select?q={query}&wt=json&rows=100&fl=pdb_id,organism_scientific_name,title"
6. Discover available fields by fetching a sample document with fl=*
When you don't know what fields are in the index:
url = "https://www.ebi.ac.uk/pdbe/search/pdb/select?q=*:*&wt=json&rows=1&fl=*"
data = json.loads(coot.coot_get_url_as_string_py(url))
for k in sorted(data['response']['docs'][0].keys()):
print(k)
Fields prefixed q_ and t_ are query/text variants of the base fields — ignore them
when exploring the schema.
7. There is no disulfide or bond_types field in the Solr index
To find structures with disulfide bonds, you must:
- Use Solr to find small proteins with ≥2 Cys in their sequence (fetch + filter in Python), then
- Use the PDBe REST API (
/pdb/entry/molecules/{pdb_id}) to confirm the sequence and structure.
Advanced Search Examples
Find structures with specific ligand:
query = "ligand_name:axitinib"
Find high-resolution kinase structures:
query = "molecule_name:kinase AND resolution:[0 TO 2.0]"
Find structures revised in 2024:
query = "revision_date:[20240101 TO 20241231]"
Practical Workflows
Detecting Structure Revisions
Check if a structure has been significantly revised since release:
import json
from datetime import datetime
def check_structure_revision(pdb_id):
"""Check if structure was revised and when"""
url = f"https://www.ebi.ac.uk/pdbe/api/pdb/entry/summary/{pdb_id}"
result = coot.coot_get_url_as_string_py(url)
if isinstance(result, bytes):
result = result.decode('utf-8')
data = json.loads(result)
entry = data[pdb_id][0]
release = entry['release_date']
revision = entry['revision_date']
# Convert to datetime for comparison
release_dt = datetime.strptime(release, "%Y%m%d")
revision_dt = datetime.strptime(revision, "%Y%m%d")
days_diff = (revision_dt - release_dt).days
years_diff = days_diff / 365.25
print(f"PDB {pdb_id}:")
print(f" Released: {release}")
print(f" Revised: {revision}")
print(f" Time since release: {years_diff:.1f} years")
if days_diff > 30:
print(f" WARNING: Structure revised {days_diff} days after release")
return True
return False
# Example usage
check_structure_revision("4wa9")
Checking Ligand Revisions
Determine if a ligand definition was updated, which might explain geometry changes:
import json
def check_ligand_revision(comp_id):
"""Check when a ligand was last revised"""
url = f"https://www.ebi.ac.uk/pdbe/api/pdb/compound/summary/{comp_id}"
result = coot.coot_get_url_as_string_py(url)
if isinstance(result, bytes):
result = result.decode('utf-8')
data = json.loads(result)
compound = data[comp_id][0]
print(f"Compound {comp_id} ({compound['name']}):")
print(f" Created: {compound['creation_date']}")
print(f" Revised: {compound['revision_date']}")
if compound['creation_date'] != compound['revision_date']:
print(f" WARNING: Ligand definition was revised")
return True
return False
# Example usage
check_ligand_revision("AXI")
Downloading and Comparing Structures
Download structures from different species and compare them:
import gzip
import json
def download_and_load_structure(pdb_id):
"""Download and load a structure from RCSB"""
url = f"https://files.rcsb.org/download/{pdb_id}.cif.gz"
print(f"Downloading {pdb_id}...")
compressed = coot.coot_get_url_as_string_py(url)
# Check if it's an error response (HTML)
if isinstance(compressed, str) and compressed.startswith("<!DOCTYPE"):
print(f"ERROR: Could not download {pdb_id}")
return None
decompressed = gzip.decompress(compressed)
imol = coot.read_coordinates_as_string(decompressed.decode('utf-8'), pdb_id)
print(f"Loaded as molecule {imol}")
return imol
def compare_species_structures(pdb_id1, pdb_id2):
"""Download two structures and superpose them"""
# Download both structures
imol1 = download_and_load_structure(pdb_id1)
imol2 = download_and_load_structure(pdb_id2)
if imol1 is None or imol2 is None:
print("Failed to download one or both structures")
return
# Superpose (using CA atoms from chain A, residues 240-400)
print(f"\nSuperposing {pdb_id2} onto {pdb_id1}...")
sel1 = "//A/240-400/CA"
sel2 = "//A/240-400/CA"
result = coot.superpose_with_atom_selection(imol1, imol2, sel1, sel2, 0)
if result >= 0:
print(f"Success! Structures superposed.")
else:
print("Superposition failed!")
return imol1, imol2
# Example: Compare human and mouse ABL1
# First find structures using Solr
query = "molecule_name:*ABL1*"
url = f"https://www.ebi.ac.uk/pdbe/search/pdb/select?q={query}&wt=json&rows=100&fl=pdb_id,organism_scientific_name"
result = coot.coot_get_url_as_string_py(url)
if isinstance(result, bytes):
result = result.decode('utf-8')
data = json.loads(result)
# Find human and mouse structures
human_pdbs = []
mouse_pdbs = []
for doc in data['response']['docs']:
org = doc.get('organism_scientific_name', ['Unknown'])[0]
if 'Homo sapiens' in org:
human_pdbs.append(doc['pdb_id'])
elif 'Mus musculus' in org:
mouse_pdbs.append(doc['pdb_id'])
print(f"Human ABL1 structures: {len(human_pdbs)}")
print(f"Mouse ABL1 structures: {len(mouse_pdbs)}")
# Compare first human and mouse structures
if human_pdbs and mouse_pdbs:
compare_species_structures(human_pdbs[0], mouse_pdbs[0])
Finding Related Structures
Search for structures with the same ligand and protein:
import json
def find_related_structures(protein_name, ligand_name=None):
"""Find structures containing specific protein-ligand combination"""
if ligand_name:
query = f'molecule_name:*{protein_name}* AND chem_comp_id:{ligand_name}'
else:
query = f'molecule_name:*{protein_name}*'
url = f"https://www.ebi.ac.uk/pdbe/search/pdb/select?q={query}&wt=json&rows=50&fl=pdb_id,title,resolution,organism_scientific_name"
result = coot.coot_get_url_as_string_py(url)
if isinstance(result, bytes):
result = result.decode('utf-8')
data = json.loads(result)
print(f"Found {data['response']['numFound']} structures")
for doc in data['response']['docs']:
org = doc.get('organism_scientific_name', ['N/A'])
if isinstance(org, list):
org = org[0] if org else 'N/A'
print(f" {doc['pdb_id']}: {doc.get('title', 'N/A')[:60]}")
print(f" Resolution: {doc.get('resolution', 'N/A')} Å")
print(f" Organism: {org}")
# Example usage
find_related_structures("ABL1", "STI") # ABL1 with imatinib
Error Handling
Always wrap API calls in try/except blocks and handle both string and bytes responses:
import json
def safe_pdbe_query(url):
"""Safely query PDBe API with error handling"""
try:
result = coot.coot_get_url_as_string_py(url)
if not result or result == "":
print(f"Empty response from {url}")
return None
# Handle bytes response
if isinstance(result, bytes):
result = result.decode('utf-8')
# Check for HTML error pages
if result.startswith("<!DOCTYPE") or result.startswith("<html"):
print(f"Received HTML error page instead of JSON")
print(result[:200])
return None
data = json.loads(result)
return data
except json.JSONDecodeError as e:
print(f"JSON parsing error: {e}")
print(f"Response was: {result[:200]}...")
return None
except Exception as e:
print(f"Error querying PDBe API: {e}")
return None
# Example usage
data = safe_pdbe_query("https://www.ebi.ac.uk/pdbe/api/pdb/entry/summary/4wa9")
if data:
print("Success!")
Common Issues and Solutions
Issue: Binary vs Text Data
The function returns bytes for binary data (gzipped files) and str for text (JSON). Always check the type:
result = coot.coot_get_url_as_string_py(url)
if isinstance(result, bytes):
# Binary data - might be gzipped
if result.startswith(b'\x1f\x8b'): # gzip magic bytes
import gzip
decompressed = gzip.decompress(result)
content = decompressed.decode('utf-8')
else:
content = result.decode('utf-8')
else:
# Already a string
content = result
Issue: JSON parsing errors with validation endpoints
Some validation endpoints return multiple JSON objects or malformed responses. Handle carefully:
# Instead of json.loads(), parse line by line or handle errors
try:
data = json.loads(result)
except json.JSONDecodeError:
# Try alternative parsing or just display raw result
print("Could not parse JSON, raw response:")
print(result[:1000])
Issue: Unicode decode errors
If you get UnicodeDecodeError, the response might contain non-UTF-8 bytes. This should be handled automatically by the function now, but if you encounter issues:
try:
result = coot.coot_get_url_as_string_py(url)
except Exception as e:
print(f"Error fetching URL: {e}")
Issue: URL encoding for complex queries
Always encode special characters in Solr queries:
import urllib.parse
query = "molecule_name:\"Protein kinase\" AND resolution:[0 TO 2.0]"
encoded = urllib.parse.quote(query)
url = f"https://www.ebi.ac.uk/pdbe/search/pdb/select?q={encoded}&wt=json"
Issue: Rate limiting
The PDBe API may rate limit excessive requests. Add delays between batch queries:
import time
pdb_ids = ["4wa9", "2hyy", "1iep"]
for pdb_id in pdb_ids:
data = safe_pdbe_query(f"https://www.ebi.ac.uk/pdbe/api/pdb/entry/summary/{pdb_id}")
# Process data...
time.sleep(0.5) # Wait 500ms between requests
Quick Reference
Most Useful Endpoints
| Purpose | Endpoint |
|---|---|
| Structure summary | /pdb/entry/summary/{pdb_id} |
| Molecule/organism info | /pdb/entry/molecules/{pdb_id} |
| Compound info | /pdb/compound/summary/{comp_id} |
| Validation outliers | /validation/residuewise_outlier_summary/entry/{pdb_id} |
| Structure status | /pdb/entry/status/{pdb_id} |
| Ligand binding sites | /pdb/entry/ligand_monomers/{pdb_id} |
| Search structures | /search/pdb/select?q={query} |
| Download coordinates | https://files.rcsb.org/download/{pdb_id}.cif.gz |
Common Solr Query Patterns
| Query | Purpose |
|---|---|
pdb_id:4wa9 |
Specific PDB entry |
molecule_name:*kinase* |
By protein name (wildcards) |
chem_comp_id:ATP |
Structures with specific ligand |
resolution:[0 TO 2.0] |
High resolution structures |
release_year:2025 |
Structures from 2025 |
revision_year:2024 |
Recently revised structures |
experimental_method:"X-ray diffraction" |
By experimental method |
organism_scientific_name:"Homo sapiens" |
By organism |
has_carb_polymer:Y |
Has carbohydrate |
has_bound_molecule:Y |
Has ligands |
Combining Queries with AND/OR
# Human kinases with resolution < 2Å from 2024
query = 'release_year:2024 AND organism_scientific_name:"Homo sapiens" AND molecule_name:*kinase* AND resolution:[* TO 2.0]'
# ABL1 from human OR mouse
query = 'molecule_name:*ABL1* AND (organism_scientific_name:"Homo sapiens" OR organism_scientific_name:"Mus musculus")'
Integration with Coot Workflows
Example: Automated Structure Quality Check
import json
def structure_quality_report(imol):
"""Generate quality report using PDBe API data"""
# Get PDB ID from molecule
pdb_file = coot.molecule_name(imol)
# Extract PDB ID from filename (assumes format like "pdb4wa9.ent" or "4wa9")
import re
match = re.search(r'(\d\w{3})', pdb_file.lower())
if not match:
print("Could not extract PDB ID from filename")
return
pdb_id = match.group(1)
# Get structure info
data = safe_pdbe_query(f"https://www.ebi.ac.uk/pdbe/api/pdb/entry/summary/{pdb_id}")
if not data:
return
entry = data[pdb_id][0]
print("=" * 60)
print(f"STRUCTURE QUALITY REPORT: {pdb_id.upper()}")
print("=" * 60)
print(f"Title: {entry['title']}")
print(f"Method: {entry['experimental_method']}")
print(f"Released: {entry['release_date']}")
print(f"Revised: {entry['revision_date']}")
# Check for significant revisions
if entry['revision_date'] != entry['release_date']:
from datetime import datetime
release = datetime.strptime(entry['release_date'], "%Y%m%d")
revision = datetime.strptime(entry['revision_date'], "%Y%m%d")
days = (revision - release).days
print(f"\n⚠️ STRUCTURE REVISED {days} days after release")
print(" Check PDBe for revision details")
print("=" * 60)
# Usage: structure_quality_report(0)
Resources
- PDBe API Documentation: https://www.ebi.ac.uk/pdbe/api/doc/
- Aggregated API: https://pdbe.org/graph-api
- Search API: https://www.ebi.ac.uk/pdbe/api/doc/search.html
- Mailing List: pdbe-api-users@ebi.ac.uk
- GitHub Examples: https://github.com/PDBeurope/pdbe-api-training
- RCSB Downloads: https://files.rcsb.org/download/
Summary
The PDBe API provides rich programmatic access to structure metadata, validation data, and search capabilities. Using coot.coot_get_url_as_string_py(), you can:
- Download coordinate files - Get structures in mmCIF/PDB format (gzipped)
- Check revision history - Detect structures and ligands that have been revised
- Access validation reports - Get quality metrics and outlier information
- Search across the PDB - Find related structures, compare organisms, filter by properties
- Get compound information - Access chemical details, SMILES, InChI
- Verify structure status - Check for supersession or obsolescence
- Integrate external data - Bring PDB metadata into Coot workflows
Key Capabilities:
- Binary data support (download gzipped files)
- Comprehensive metadata access
- Powerful search with well-understood limitations
- Cross-species structure comparison
- Revision tracking and provenance checking
Key Limitations:
- Solr search cannot query detailed connectivity or geometry
- Versioned coordinates not accessible via HTTPS (use current versions)
- For analyses requiring atom-level connectivity, download and parse structures locally
This enables powerful automated quality checks, cross-species structure comparison, data-driven validation, and integration of PDB metadata into Coot-based structural biology workflows.