name: proteus license: MIT description: > Use this skill when the user asks you to work with protein structures, molecular visualization, or structural biology tools. TRIGGER when: the user mentions PyMOL, ChimeraX, AlphaFold, Rosetta, PyRosetta, UniProt, RCSB PDB, PDBe, PDB files, protein structures, molecular rendering, pLDDT, RMSD, structure alignment, binding pockets, drug-target analysis, cryo-EM density maps, homology modeling, or protein design. Also trigger when the user opens/loads .pdb, .cif, .mmcif, .sdf, or .mol2 files, or .mrc density maps (see references/chimerax.md for cryo-EM workflows). DO NOT TRIGGER for: general biology questions with no structural component, bioinformatics sequence-only tasks (BLAST, MSA), or genomics/transcriptomics.
Proteus — Structural Biology Agent Skill
You are an AI agent driving structural biology tools programmatically. This skill teaches you how to control PyMOL, ChimeraX, AlphaFold DB, and Rosetta/PyRosetta from the command line — including the non-obvious gotchas that will otherwise cost hours of debugging.
Tool Detection
Before doing anything, detect what's installed:
import shutil, subprocess
PYMOL = shutil.which("pymol")
if not PYMOL:
# macOS common locations
import os
for p in ["/Applications/PyMOL.app/Contents/bin/pymol",
os.path.expanduser("~/Applications/PyMOL.app/Contents/bin/pymol")]:
if os.path.isfile(p):
PYMOL = p
break
CHIMERAX = shutil.which("ChimeraX") or shutil.which("chimerax")
if not CHIMERAX:
import glob
hits = glob.glob("/Applications/ChimeraX*.app/Contents/bin/ChimeraX")
if hits:
CHIMERAX = sorted(hits)[-1] # latest version
If neither is found, do not guess paths. Continue with zero-dependency workflows
(scripts/pdb_info.py, AlphaFold metadata fetches, file inspection) when they
fit the task; otherwise tell the user what to install and stop.
Tool Selection — When to Use What
| Task | Best Tool | Why |
|---|---|---|
| Headless rendering (no display) | PyMOL | Software ray tracer works fully headless |
| Interactive demo with live GUI | ChimeraX REST API | HTTP control of running GUI session |
| H-bonds, SASA, clashes, contacts | ChimeraX | Built-in analysis commands, even in --nogui |
| Structure alignment + RMSD | Either | PyMOL cealign or ChimeraX matchmaker |
| AlphaFold confidence analysis | PyMOL + AlphaFold API | Fetch prediction, color by pLDDT, render headless |
| Experimental PDB download | fetch_pdb.py |
RCSB metadata + coordinates |
| Protein name -> accession | uniprot_lookup.py |
Resolve names/genes before AlphaFold fetch |
| PDB/mmCIF preflight | structure_info.py |
Zero-dependency file inspection |
| Cryo-EM density map visualization | ChimeraX REST API | Volume rendering requires GPU/display |
| Quick legacy PDB inspection | pdb_info.py script |
Backward-compatible PDB-only inspector |
| Protein design / scoring | Rosetta/PyRosetta | Or ML alternatives (ProteinMPNN, RFdiffusion) |
Key architectural insight: ChimeraX --nogui mode has NO OpenGL context on macOS.
It can run analysis commands (H-bonds, SASA, matchmaker, info) but CANNOT render images.
For ChimeraX rendering, you must use the REST API approach with a running GUI instance.
Reading Guide
Load reference files on demand — don't read all of them upfront:
| Working with... | Read this file |
|---|---|
| PyMOL (any task) | references/pymol.md |
| ChimeraX (any task) | references/chimerax.md |
| AlphaFold DB predictions | references/alphafold.md |
| PDB/UniProt/PDBe/RCSB data lookup | references/data-sources.md |
| File format choices (.pdb, .cif, .sdf, .mrc) | references/file-formats.md |
| Prediction models beyond AlphaFold DB | references/prediction-models.md |
| Rosetta / protein design | references/rosetta.md |
Agent Helper Scripts
These scripts handle the hard parts of tool communication.
IMPORTANT: Always run python3 scripts/<script>.py --help first. Treat the
scripts as black-box utilities by default. Only read the source when you are
debugging, patching, or the help text is insufficient for the task.
| Script | Purpose | Example |
|---|---|---|
scripts/proteus_doctor.py |
Local readiness report for tools, scripts, and APIs | python3 scripts/proteus_doctor.py --network --json |
scripts/resolve_structure.py |
Resolve file/PDB/UniProt/name to structure + provenance | python3 scripts/resolve_structure.py TP53 --json |
scripts/fetch_pdb.py |
RCSB PDB metadata + coordinate fetcher | python3 scripts/fetch_pdb.py 4HHB --json |
scripts/uniprot_lookup.py |
Protein/gene name to UniProt accession | python3 scripts/uniprot_lookup.py TP53 --gene-exact --json |
scripts/structure_info.py |
Zero-dep PDB/mmCIF inspector | python3 scripts/structure_info.py structure.cif --json |
scripts/fetch_alphafold.py |
AlphaFold DB fetcher | python3 scripts/fetch_alphafold.py P04637 --pae --json |
scripts/pae_report.py |
Summarize AlphaFold PAE domain/flexibility hints | python3 scripts/pae_report.py AF-P04637-F1_pae.json --json |
scripts/validation_report.py |
Fetch wwPDB/RCSB validation quality metrics | python3 scripts/validation_report.py 4HHB --json |
scripts/pocket_report.py |
Zero-dep ligand pocket contacts from PDB/PDB ID | python3 scripts/pocket_report.py 1HSG --json |
scripts/interface_report.py |
Zero-dep protein-protein interface residues between chains | python3 scripts/interface_report.py 1BRS --chains A,D --json |
scripts/compare_structures.py |
PyMOL CE alignment + optional per-residue deviations | python3 scripts/compare_structures.py ref.pdb mobile.pdb --json |
scripts/pymol_agent.py |
Headless PyMOL driver (info, render, pocket figure, density fit, spin movie) | python3 scripts/pymol_agent.py render structure.pdb out.png --color plddt |
scripts/chimerax_agent.py |
Headless ChimeraX driver (analysis, --nogui) |
python3 scripts/chimerax_agent.py run "open 1ubq; info chains #1" |
scripts/chimerax_rest.py |
Managed ChimeraX REST GUI render (GPU) + turntable | python3 scripts/chimerax_rest.py render structure.pdb out.png --color plddt |
scripts/add_helix_records.py |
Add HELIX records to CA-only backbones so cartoons render | python3 scripts/add_helix_records.py model.pdb --json |
scripts/map_info.py |
MRC/CCP4 map stats + sigma-based contour levels | python3 scripts/map_info.py map.mrc --json |
scripts/pdb_info.py |
Legacy zero-dep PDB inspector (PDB only) | python3 scripts/pdb_info.py structure.pdb |
Critical Gotchas (Read This First)
These are hard-won discoveries. Each one represents hours of debugging that you can skip by knowing them upfront.
PyMOL
Never use the
-dflag for complex commands. The shell interprets>,<, and|in PyMOL selection syntax as redirection/pipe operators. Instead, write a.pmlscript file and runpymol -c -q script.pml.# WRONG — breaks on selections like "b > 90" subprocess.run(["pymol", "-c", "-q", "-d", 'color blue, b > 90']) # RIGHT — write a .pml file with open("/tmp/cmd.pml", "w") as f: f.write("color blue, b > 90\n") subprocess.run(["pymol", "-c", "-q", "/tmp/cmd.pml"])Why: The
-dflag passes the string through the shell, where>becomes stdout redirection. This is never mentioned in PyMOL documentation.PyMOL stdout is unreliable.
print()in headless mode doesn't always capture to stdout. Always write results to a JSON temp file and read it back from the calling process.cmd.quit()is required at the end of every headless script. Without it, the PyMOL process hangs indefinitely.<=doesn't exist in PyMOL selection syntax. For pLDDT coloring, use layered overrides — paint the broadest range first, then override with narrower selections:color orange, all # base: everything is low confidence color yellow, b > 50 # override: medium color cyan, b > 70 # override: high color blue, b > 90 # override: very highcealignargument order is (target, mobile). The first argument is the reference that stays fixed; the second gets moved. This is opposite to what you might expect.cmd.iteratevscmd.iterate_state: Useiteratefor molecular properties (chain, resname, B-factor). Useiterate_statefor 3D coordinates (x, y, z). Using the wrong one silently returns nothing.GUI demos require threading. Running a long script in PyMOL's main thread freezes the GUI completely. Wrap your demo in
threading.Thread(target=run, daemon=True).start().In GUI mode,
print()goes to PyMOL's internal console, not the terminal. Usesys.stderr.write(text + "\n")for terminal output.Path(__file__)is unreliable inside PyMOL's-rrunner. Use hardcoded absolute paths or resolve paths before launching PyMOL.
ChimeraX
ChimeraX
--noguiCANNOT render images on macOS. There is no OpenGL context. Use it only for analysis. For rendering, use the REST API with a GUI session (seereferences/chimerax.md).ChimeraX is NOT thread-safe. Sending REST API calls from a Python background thread causes
EXC_BAD_ACCESScrashes. All REST calls must happen from the main thread.close sessiondoes NOT reset model IDs. After closing and reopening structures, model IDs continue incrementing (#1, #2, #3...). The only way to reset to #1 is a full process restart. Always use dynamic model ID discovery after eachopencommand.Cryo-EM gotchas: Use
lighting simple(notfull) for volume maps —fullwashes out colors on white. Neverclosea map mid-session — it shifts all model IDs. Usehide/showinstead. Seereferences/chimerax.mdfor full cryo-EM patterns.stdout lines have
INFO:/WARNING:/ERROR:prefixes. Parse them out. Also filter lines starting withINFO: Executing:— those are echoed commands, not results.
AlphaFold DB
The API returns a list, not a dict.
json.loads(response)gives[{...}]. You must dodata[0]["pdbUrl"], notdata["pdbUrl"].Always query
latestVersionfrom the API. Don't hardcodev4orv6in URLs — the version changes and old URLs return 404.P62988 (ubiquitin) is NOT in the database. Use P0CG48 (polyubiquitin-C, 685 residues) instead. Note: this is the full polyubiquitin chain, not the 76-residue monomer.
Rendering, Animation & Maps
PyMOL spin loops need
set cache_frames, 0. Otherwise PyMOL caches every ray-traced frame in RAM and the process is OOM-killed partway through a turntable. Set it before the render loop. (pymol_agent.py spindoes this for you.)set auto_zoom, 0before loading when composing a manual view. Eachload/showotherwise re-zooms and fights yourorient/zoom/turnframing. Set it first, frame last.Whole-map isosurfaces stall in headless PyMOL. The headless build lacks the VTKm accelerator, so contouring a full density map can hang for minutes even on a small map. Carve the mesh around the model (
isomesh m, map, level, sele, carve=2.5) or do the whole-map surface in ChimeraX. Usescripts/map_info.pyto pick a sigma-based contour level.ChimeraX REST
savecan return before the PNG is flushed — a 0-byte file, worse under heavy cartoon recompute. Aftersave, issuewait 1and poll the file for non-zero size, retrying a few times. (scripts/chimerax_rest.pyhandles this.)ChimeraX color-name traps.
gold/yellowatom spheres often render visibly green — useorangefor a true gold read. Andcolor #1 cartoons #...silently mis-parsescartoonsas a color: the command iscartoon, thencolor #1 <color>.Deposited coordinates are the asymmetric unit, not necessarily the biological assembly. A "dimer" entry may deposit a single chain. Build the functional oligomer with ChimeraX
sym #1 assembly 1 copies true, or expand crystal neighbors in PyMOL withsymexp mate_, obj, sele, 5.
Common Workflows
Quick Structure Inspection
python3 scripts/structure_info.py structure.cif --json # zero-dep PDB/mmCIF overview
python3 scripts/pdb_info.py structure.pdb # legacy PDB-only overview
python3 scripts/pymol_agent.py info structure.pdb # detailed with PyMOL
Experimental Structure Fetch
python3 scripts/fetch_pdb.py 4HHB --json # RCSB metadata + mmCIF
python3 scripts/fetch_pdb.py 1HSG --format pdb # legacy PDB format if needed
python3 scripts/fetch_pdb.py 4HHB --assembly 1 --json # biological assembly mmCIF
AlphaFold Confidence Analysis
python3 scripts/uniprot_lookup.py TP53 --gene-exact --json # resolve accession if needed
python3 scripts/fetch_alphafold.py P04637 --pae --json # fetch p53 prediction
# Output filename uses modelEntityId from API, typically AF-{UNIPROT}-F1.pdb
python3 scripts/pae_report.py AF-P04637-F1_pae.json --json # summarize domain uncertainty
python3 scripts/pymol_agent.py render AF-P04637-F1.pdb output.png
Then color by pLDDT bins — see references/alphafold.md for the standard color scheme.
Predicted vs Experimental Comparison
- Fetch AlphaFold prediction for the protein's UniProt ID
- Load both structures in PyMOL
- Use
cealign(notalign) — it's purely structural, works for divergent sequences - Extract per-residue deviations with
iterate_state - Render side-by-side or overlay
Protein-Protein Interface Analysis
# Zero-dependency: interface residues between chains, JSON for chaining
python3 scripts/interface_report.py complex.pdb --json # all chain pairs
python3 scripts/interface_report.py 1BRS --chains A,D --cutoff 4.5 --json
For a visual interface dissection (contacts/H-bonds/buried surface), use
ChimeraX — see references/chimerax.md.
Binding Pocket Analysis
# One-command annotated figure: ligand + pocket sticks + polar contacts + context
python3 scripts/pymol_agent.py pocket 1HSG.pdb pocket.png --label
# Or build it by hand. PyMOL: select residues within 5A of any ligand
cmd.select("pocket", "byres organic around 5")
# Show pocket as sticks, ligand as ball-and-stick
cmd.show("sticks", "pocket")
cmd.show("spheres", "organic")
cmd.set("sphere_scale", 0.25, "organic")
ChimeraX Interactive Demo (REST API)
# 1. Launch ChimeraX with REST
/path/to/ChimeraX --cmd "remotecontrol rest start port 50888" &
# 2. Verify connection
curl -s "http://127.0.0.1:50888/run?command=version"
# 3. Send commands
curl "http://127.0.0.1:50888/run?command=open+1ubq"
curl "http://127.0.0.1:50888/run?command=cartoon"
curl "http://127.0.0.1:50888/run?command=save+/tmp/render.png+width+1200+height+900+supersample+3"
Publication-Quality Rendering (PyMOL)
set bg_color, white
set ray_opaque_background, 1
set antialias, 2
set cartoon_fancy_helices, 1
set cartoon_smooth_loops, 1
set cartoon_flat_sheets, 1
ray 1200, 900
png output.png
Render presets are also available on the helper: pymol_agent.py render file.pdb out.png --preset publication|illustration|soft --color spectrum|chain|bfactor|plddt.
Turntable Movie (Headless)
# PyMOL ray-traces each frame (works with no display); ffmpeg encodes them.
python3 scripts/pymol_agent.py spin structure.pdb spin.mp4 --frames 60 --color plddt
# Degrades gracefully: with no ffmpeg, the frames are written and returned.
This is the macOS-correct path — ChimeraX needs a GPU/GUI context to render.
ChimeraX GPU Rendering (Managed REST)
# Launches a GUI ChimeraX on an ephemeral port, renders via GPU, tears down.
python3 scripts/chimerax_rest.py render structure.pdb out.png --style surface --color bychain
python3 scripts/chimerax_rest.py spin structure.pdb out.mp4 --frames 72
Unlike chimerax_agent.py (analysis only, --nogui), this renders images and
defeats the 0-byte-PNG save race (gotcha 21).
CA-only Backbone (de-novo designs)
# RFdiffusion / Genie backbones render as spaghetti without HELIX records.
python3 scripts/add_helix_records.py model.pdb -o model_ss.pdb --json
# Then render model_ss.pdb; in PyMOL also: set cartoon_trace_atoms, 1
Cryo-EM Density Fit
# Sigma-based contour level (absolute level differs per map).
python3 scripts/map_info.py map.mrc --json # -> suggested_level at 1/1.5/2/3 sigma
# Render a model in density — the mesh is carved around the model, which avoids
# the whole-map contour stall in headless PyMOL (gotcha 20).
python3 scripts/pymol_agent.py density model.pdb fit.png --map map.mrc --residue "chain A and resi 25"
# No map yet? Simulate gaussian density from the model (predicted/designed structures).
python3 scripts/pymol_agent.py density model.pdb fit.png --simulate
Good Demo Proteins
| UniProt / PDB | Protein | Good for |
|---|---|---|
| P04637 | p53 | Disorder (pLDDT ~75), mixed confidence regions |
| P69905 | Hemoglobin alpha | Very high confidence (pLDDT ~98) |
| P0CG48 | Polyubiquitin-C (685 res) | Well-folded repeats; use when ubiquitin is requested |
| P01308 | Insulin | Small, well-characterized |
| 1HSG | HIV-1 protease + indinavir | Drug-target binding pocket |
| 1BRS | Barnase-barstar complex | Protein-protein interface |
| 4HHB | Hemoglobin tetramer | Multi-chain, quaternary structure |
| 1EMA | GFP | Chromophore, fluorescence |
Output Pattern
When running analysis, always produce structured output. The agent scripts
return JSON with {"status": "ok", "data": {...}} or {"status": "error", "error": "..."} in
--json mode. Many scripts also mirror key fields at top level for backward
compatibility, but agents should read data first. Their temporary files are
per-invocation, so they are safe to call in parallel within the same workspace.
For multi-step workflows, write a summary JSON report at the end with:
- Input files and parameters
- Key measurements (RMSD, distances, counts, areas)
- Output file paths (renders, saved structures)
- Interpretation notes
Platform Notes
- macOS: PyMOL and ChimeraX are in
/Applications/. PyMOL headless rendering works. ChimeraX--noguihas no OpenGL. - Linux: Both tools typically on
$PATH. PyMOL headless works. ChimeraX--noguimay have OpenGL via virtual framebuffer (xvfb-run). - Rosetta: Requires an academic license. Free alternatives: ProteinMPNN (sequence design), LocalColabFold (structure prediction), ESM2 (embeddings), RFdiffusion (backbone generation).
Quick Reference
| I want to... | Do this |
|---|---|
| Check machine readiness | python3 scripts/proteus_doctor.py --network --json |
| Resolve any common structure query | python3 scripts/resolve_structure.py TP53 --json |
| Resolve protein/gene name to UniProt | python3 scripts/uniprot_lookup.py TP53 --gene-exact --json |
| Fetch an experimental PDB structure | python3 scripts/fetch_pdb.py 4HHB --json |
| Inspect PDB/mmCIF file (no tools needed) | python3 scripts/structure_info.py file.cif --json |
| Inspect a legacy PDB file | python3 scripts/pdb_info.py file.pdb |
| Summarize AlphaFold PAE | python3 scripts/pae_report.py AF-P04637-F1_pae.json --json |
| Fetch validation metrics | python3 scripts/validation_report.py 4HHB --json |
| Report ligand pocket contacts | python3 scripts/pocket_report.py 1HSG --json |
| Report protein-protein interface residues | python3 scripts/interface_report.py complex.pdb --json |
| Compare two structures | python3 scripts/compare_structures.py ref.pdb mobile.pdb --per-residue --json |
| Get structure info via PyMOL | python3 scripts/pymol_agent.py info file.pdb |
| Render a structure headless | python3 scripts/pymol_agent.py render file.pdb out.png |
| Render an annotated binding-pocket figure | python3 scripts/pymol_agent.py pocket file.pdb out.png --label |
| Render a turntable movie | python3 scripts/pymol_agent.py spin file.pdb out.mp4 |
| Render via ChimeraX GPU (REST) | python3 scripts/chimerax_rest.py render file.pdb out.png |
| Fix a CA-only backbone for cartoons | python3 scripts/add_helix_records.py model.pdb |
| Pick a cryo-EM contour level | python3 scripts/map_info.py map.mrc --json |
| Render a model in cryo-EM density | python3 scripts/pymol_agent.py density model.pdb out.png --map map.mrc |
| Fetch an AlphaFold prediction | python3 scripts/fetch_alphafold.py UNIPROT_ID --pae --json |
| Align two structures (ChimeraX) | python3 scripts/chimerax_agent.py align ref.pdb mobile.pdb |
| Measure SASA | python3 scripts/chimerax_agent.py sasa file.pdb |
| Find H-bonds between chains | python3 scripts/chimerax_agent.py hbonds file.pdb --chain1 A --chain2 B |
| Run arbitrary PyMOL commands | python3 scripts/pymol_agent.py run "fetch 1ubq; show cartoon" |
| Run arbitrary ChimeraX commands | python3 scripts/chimerax_agent.py run "open 1ubq; info chains #1" |
| Control ChimeraX GUI via REST | Read references/chimerax.md — REST API section |
| Color by AlphaFold confidence | Read references/alphafold.md — pLDDT Coloring section |
| Choose between PDB, mmCIF, SDF, MRC | Read references/file-formats.md |
| Use RCSB/PDBe/UniProt APIs | Read references/data-sources.md |
| Consider AF3/Boltz/Chai/ColabFold | Read references/prediction-models.md |
| Do protein design without Rosetta | Read references/rosetta.md — ML Alternatives section |