name: phenix description: Structural biology workflow orchestrator using the Phenix suite. Use when the user wants to determine, refine, or validate protein structures — including AlphaFold structure retrieval, X-ray crystallography, cryo-EM, MolProbity validation, or visualization script generation. allowed-tools: Bash, Read, Write, Edit, AskUserQuestion, Agent user-invocable: true
Phenix — Structural Biology Agent
Orchestrate structural biology workflows: retrieve AlphaFold predictions, validate structures, solve X-ray and cryo-EM structures, manage refinement loops, and generate visualization scripts.
Entry Points
| Command | What It Does |
|---|---|
/phenix |
Start a new structural biology project (guided workflow) |
/phenix retrieve <accession> |
Retrieve AlphaFold structure(s) from EBI |
/phenix validate <model> |
Run MolProbity validation on a structure |
/phenix xray <data.mtz> <seq.fa> |
Start X-ray structure determination |
/phenix cryoem <map.mrc> <seq.fa> |
Start cryo-EM structure determination |
/phenix refine |
Run next refinement cycle (within active project) |
/phenix inspect |
Generate visualization scripts for current model |
/phenix status |
Show project status, current metrics, next recommended step |
/phenix history |
Show refinement convergence history |
Parse the user's command to determine which module to read.
Module Routing
| User Intent | Module |
|---|---|
| Retrieve AlphaFold structures | retrieve.md |
| Validate a structure (MolProbity) | validation.md |
| X-ray crystallography pipeline | xray-pipeline.md |
| Cryo-EM pipeline | cryoem-pipeline.md |
| Generate Coot/PyMOL/ChimeraX scripts | visualization.md |
| Review cross-project patterns | memory.md |
Read the appropriate module before proceeding. Each module contains the full workflow, commands, and output parsing for that step.
References
Read these as needed for parameter details and troubleshooting:
| Reference | When to Read |
|---|---|
| phenix-parameters.md | Setting up .eff parameter files for Phenix tools |
| resolution-strategies.md | Choosing refinement strategy based on resolution |
| troubleshooting.md | When a Phenix tool fails or produces unexpected results |
Preconditions
- Phenix installed on NERSC — install with the helper at
data/structural_biology/scripts/, file nameinstall_phenix.sh, then:module load conda conda activate phenix phenix.version - KBASE_AUTH_TOKEN set in
.env(for BERDL queries and MinIO storage) - MinIO access configured (for structure storage) — see
/berdl-minioskill - For SLURM jobs: Access to Perlmutter compute nodes
- Python 3.9+: Use
module load pythonon NERSC (system Python is too old)
Phenix Environment Setup
Before running any Phenix tool:
module load conda
conda activate phenix
phenix.version
Pipeline Scripts
The pipeline orchestrator (run_pipeline.py under data/structural_biology/scripts/) handles
project creation, SLURM job submission, output parsing, and provenance logging:
module load python
PIPELINE_DIR=data/structural_biology/scripts
PIPELINE="$PIPELINE_DIR/run_pipeline.py"
python3 "$PIPELINE" retrieve --accession P0A6Y8
python3 "$PIPELINE" validate --model model.pdb
python3 "$PIPELINE" xray --data data.mtz --seq seq.fa --project-id struct_YYYYMMDD_name
python3 "$PIPELINE" refine --project-id struct_YYYYMMDD_name
python3 "$PIPELINE" process --project-id struct_YYYYMMDD_name --cycle 3
python3 "$PIPELINE" accept --project-id struct_YYYYMMDD_name --model rebuilt.pdb
python3 "$PIPELINE" converge --project-id struct_YYYYMMDD_name
python3 "$PIPELINE" finalize --project-id struct_YYYYMMDD_name
python3 "$PIPELINE" batch-validate --accessions P0A6Y8 Q9Y6K9 --output-dir results/
python3 "$PIPELINE" advise --resolution 2.5 --method xray
python3 "$PIPELINE" figures --project-id struct_YYYYMMDD_name
python3 "$PIPELINE" dashboard
python3 "$PIPELINE" status --project-id struct_YYYYMMDD_name
Refinement Loop Workflow
- Submit refinement:
run_pipeline.py refine --project-id ... - After SLURM job finishes:
run_pipeline.py process --project-id ... --cycle N - Process generates Coot/PyMOL/ChimeraX scripts and checks convergence
- If not converged: user inspects model in Coot, rebuilds, saves
- Accept rebuilt model:
run_pipeline.py accept --project-id ... --model rebuilt.pdb - Repeat from step 1 until converged
- Finalize:
run_pipeline.py finalize --project-id ...
Project State Management
Each structural biology project is tracked in a project directory on MinIO:
s3a://cdm-lake/tenant-general-warehouse/kescience/structural-biology/
├── alphafold-structures/{accession}/ # Retrieved AF structures
│ ├── model.pdb
│ ├── model.cif
│ └── pae.json
└── projects/{project_id}/ # Structure determination projects
├── input/ # Experimental data
├── cycles/cycle_{NNN}/ # Refinement cycle outputs
├── scripts/ # Visualization scripts
├── figures/ # Generated figures
└── final/ # Final depositable model
Project ID Convention
struct_{YYYYMMDD}_{short_name} — e.g., struct_20260314_ecoli_dhfr
Tracking State
The agent tracks project state via:
- MinIO directory structure — what files exist tells us what step we're at
- Delta Lake
refinement_cycles— queryable history of all refinement iterations - Local project notes —
project_notes.jsonin the project directory
Compute Strategy
| Step | Where | Runtime |
|---|---|---|
| AlphaFold retrieval | NERSC login node | Seconds (HTTP) |
| phenix.xtriage | NERSC interactive | Seconds |
| phenix.validate | NERSC interactive | Seconds-minutes |
| phenix.process_predicted_model | NERSC interactive | Seconds |
| phenix.refine | NERSC SLURM batch | Minutes-hours |
| phenix.real_space_refine | NERSC SLURM batch | Minutes-hours |
| Phaser | NERSC SLURM batch | Minutes-hours |
| AutoBuild | NERSC SLURM batch | Hours |
| PredictAndBuild | NERSC SLURM batch | Hours-days |
| map_to_model | NERSC SLURM batch | Hours |
| Figure generation (headless) | NERSC interactive | Seconds-minutes |
SLURM Job Template
For compute-intensive Phenix steps:
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --constraint=cpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --time=04:00:00
#SBATCH --job-name=phenix_{tool}_{project_id}
#SBATCH --output=phenix_%j.out
#SBATCH --error=phenix_%j.err
module load phenix
# Tool-specific command (filled in by agent)
{phenix_command}
Delta Lake Tables
Results are stored in kescience_structural_biology — use berdl_notebook_utils.get_table_schema(database="kescience_structural_biology", table=..., detailed=True, return_json=False) for the live schema.
| Table | Purpose |
|---|---|
structure_projects |
One row per structure determination project |
refinement_cycles |
Full refinement history with metrics |
validation_reports |
Standalone validations (including AlphaFold-only) |
alphafold_structures |
Retrieved structure file metadata |
Provenance
Every agent action is logged as a JSON record in the project's provenance.jsonl file:
{
"project_id": "struct_20260314_ecoli_dhfr",
"action": "refinement",
"tool": "phenix.refine",
"tool_version": "1.21.2",
"input_model": "s3://...cycles/cycle_003/model.pdb",
"parameters": {"strategy": "individual_sites+individual_adp+tls"},
"output_model": "s3://...cycles/cycle_004/model.pdb",
"metrics": {"r_work": 0.198, "r_free": 0.234, "molprobity_score": 1.45},
"decision_rationale": "Added TLS refinement because R-gap was >0.05",
"timestamp": "2026-03-14T10:30:00Z"
}
Instructions for Claude
- Parse the user's command to determine the entry point (retrieve, validate, xray, cryoem, etc.)
- Read the appropriate module for the target workflow
- Read resolution-strategies.md before making any refinement parameter decisions
- Read docs/structural_biology_memory.md for cross-project lessons when starting refinement
- Check project state — what step are we at? What has been tried?
- Run automated steps and parse outputs carefully
- For human-in-the-loop steps: generate visualization scripts, highlight problem regions, and wait for the user
- Log provenance for every action taken
- Store results — upload to MinIO, update Delta Lake tables
- Update memory — if a new pattern or lesson is learned, update
docs/structural_biology_memory.md
Decision Thresholds
Use these defaults unless the user specifies otherwise:
| Metric | Good | Acceptable | Poor |
|---|---|---|---|
| R-free (X-ray) | < 0.25 | 0.25-0.30 | > 0.30 |
| R-gap (R-free - R-work) | < 0.05 | 0.05-0.07 | > 0.07 |
| MolProbity score | < 1.5 | 1.5-2.5 | > 2.5 |
| Ramachandran favored | > 97% | 95-97% | < 95% |
| Ramachandran outliers | < 0.2% | 0.2-0.5% | > 0.5% |
| Clashscore | < 5 | 5-10 | > 10 |
| Rotamer outliers | < 1% | 1-3% | > 3% |
Safety Rules
- Never delete experimental data — only create new files
- Always keep the previous cycle's model — refinement can make things worse
- Never skip validation — always validate after refinement
- Confirm with user before submitting SLURM jobs that will run > 1 hour
- Log every action — provenance is non-negotiable
Pitfall Detection
When you encounter errors, unexpected results, retry cycles, performance issues, or data surprises during this task, follow the pitfall-capture protocol. Read .claude/skills/pitfall-capture/SKILL.md and follow its instructions to determine whether the issue should be added to the active project's projects/<id>/memories/pitfalls.md.