name: alphafold-database-fetch-and-analyze description: > Retrieve and analyze AlphaFold predicted structures for a protein. Use when the user provides a specific UniProt Accession ID and wants structural confidence metrics (pLDDT), domain boundary analysis, or disorder assessment. Do not use if the user only has a protein name, gene name, or amino acid sequence — ask for a UniProt ID first.
AlphaFold Database: Fetch and Analyze
Prerequisites
uv: Read theuvskill and follow its Setup instructions to ensureuvis installed and on PATH.- User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://alphafold.ebi.ac.uk/, then (2) create the file recording the notification text and timestamp.
Overview
Downloads AlphaFold predicted structures (mmCIF) and Predicted Aligned Error (PAE) matrices from the AlphaFold Database for a given UniProt ID, then performs automated heuristic analysis on structural confidence (pLDDT), intrinsically disordered regions, rigid domain boundaries, and inter-domain flexibility.
Do NOT use when:
- The user only has a protein name, gene name, or amino acid sequence (no UniProt ID) — ask them to look up the ID on UniProt.
- The user wants to search for structural homologs (use Foldseek).
- The user wants to run AlphaFold predictions on a custom sequence.
- The user needs experimental PDB structures (use RCSB PDB).
Core Rules
- Use the Wrapper: ALWAYS execute the provided helper scripts to query the database rather than accessing the database directly. The scripts automatically enforce the required rate limit gracefully.
- Do not attempt to calculate domain boundaries or assess structural disorder yourself; always rely on the output provided by the script.
- If this skill is used, ensure this is mentioned in the output.
Utility Scripts
1. Fetch Structure Files
Downloads the .cif structure file, _predicted_aligned_error.json, and API
metadata JSON (-metadata.json) for a UniProt ID. Handles fragment fallback for
very large proteins.
Examples:
uv run scripts/fetch_structure.py P00520 -o /path/to/output/
uv run scripts/fetch_structure.py P04637 -o /path/to/custom_results/
Always specify -o with an absolute path or a path relative to the user's
project root, never a path relative to the skill directory.
2. Analyze pLDDT Confidence
Reads pLDDT confidence metrics from a saved AFDB metadata JSON file (produced by
fetch_structure.py) and prints a heuristic confidence assessment (structured,
disordered, mixed).
Example:
uv run scripts/analyze_plddt.py ./data/AF-P00520-F1-metadata.json
3. Analyze PAE / Domain Boundaries
Reads a downloaded PAE JSON file and detects rigid domain boundaries using a sliding-window PAE heuristic.
Example:
uv run scripts/analyze_pae.py ./data/AF-P00520-F1-predicted_aligned_error_v6.json
Interpreting the Output
The script prints analysis to stdout. Read it carefully and synthesize the results for the user:
- Isoform / Large Protein Warning (MANDATORY): Check the script output for
any
[!] WARNINGlines. If the script reports that no canonical entry was found and an isoform was used, or if the protein is very large (>2700 AAs), you MUST prominently relay this warning to the user. Do not omit this warning. - Synthesize the Structural Analysis: Combine the "pLDDT Conclusion" and the "PAE Structural Conclusion" into a single, cohesive overall summary. Describe the protein's overall folding confidence, the presence of disordered regions, and its rigid domain layout.
- Highlight the supporting metrics:
- Overall Global pLDDT and the breakdown of fraction confidence (especially Very Low vs. Very High).
- Domain Boundary Analysis (number of distinct global domains and their specific residue ranges).
- Explicit Disorder Warning: If the analysis concludes that the protein is highly intrinsically disordered (e.g., high fraction of <50 pLDDT or lack of rigid domains), issue a separate, prominent warning. Advise the user against proceeding with whole-protein downstream structural analysis (like Foldseek or docking). If small ordered domains exist amidst the disorder, advise the user to restrict any future analysis strictly to those specific residue boundaries.
- Remind the user that per-residue pLDDT is embedded in the B-factor column of the downloaded mmCIF file.