protein-design-workflow - SKILL.md Agent Skill

name: protein-design-workflow description: > End-to-end guidance for protein design pipelines. Use this skill when: (1) Starting a new protein design project, (2) Need step-by-step workflow guidance, (3) Understanding the full design pipeline, (4) Planning compute resources and timelines, (5) Integrating multiple design tools. license: MIT category: orchestration tags: [guidance, pipeline, workflow] source: https://github.com/adaptyvbio/protein-design-skills

Protein Design Workflow Guide

Standard Pipeline Overview

Target Preparation → All-Atom Design → Structure Validation → Filtering
       |                   |                    |                  |
       v                   v                    v                  v
  (pdb skill)         (boltzgen)           (chai/boltz)      (protein-qc)

Phase 1: Target Preparation

# Download from PDB
curl -o target.pdb "https://files.rcsb.org/download/XXXX.pdb"

Extract target chain, remove waters/ligands
Trim to binding region + 10Å buffer
Select 3-6 exposed hotspot residues (K, R, E, D, W, Y, F)

Output: target_prepared.cif, hotspot residue list

Phase 2: Design

Option A: BoltzGen — All-atom (recommended)

# Create YAML config
cat > binder.yaml << 'EOF'
entities:
  - protein:
      id: B
      sequence: 70..100
  - file:
      path: target.cif
      include:
        - chain:
            id: A
      binding_types:
        - chain:
            id: A
            binding: 45,67,89
EOF

GPU=L40S modal run modal_boltzgen.py \
  --input-yaml binder.yaml \
  --protocol protein-anything \
  --num-designs 100

Output: 100 all-atom designs (backbone + sequence + side chains)

Option B: BindCraft — End-to-end

GPU=A100 modal run modal_bindcraft.py \
  --input-pdb target.pdb \
  --hotspots "A45,A67,A89" \
  --number-of-final-designs 50

Output: 50 pre-validated designs with AF2 scores

Phase 3: Structure Validation

# Chai-1 (recommended, handles protein + ligand)
modal run modal_chai1.py \
  --input-faa all_sequences.fasta \
  --out-dir predictions/

# Or Boltz (open-source alternative)
modal run modal_boltz.py \
  --input-yaml complex.yaml \
  --out-dir predictions/

Output: Structure predictions with pLDDT, ipTM, PAE

Phase 4: Filtering

import pandas as pd

designs = pd.read_csv('all_metrics.csv')
filtered = designs[
    (designs['pLDDT'] > 0.85) &
    (designs['ipTM'] > 0.50) &
    (designs['PAE_interface'] < 10) &
    (designs['scRMSD'] < 2.0)
]
filtered['score'] = 0.3 * filtered['pLDDT'] + 0.3 * filtered['ipTM'] + \
                    0.2 * (1 - filtered['PAE_interface'] / 20) + \
                    0.2 * filtered['esm2_pll']
top_designs = filtered.nlargest(50, 'score')

Resource Planning

Stage	GPU	Time (100 designs)	Cost
BoltzGen	L40S	2-3h	~$20
Chai validation	A100	1h	~$4.50
Filtering	CPU	15 min	~$0

Timeline: Small campaign (100 designs): 4-6h total

Quality Checkpoints

After design (BoltzGen output)

Check design diversity (vary binder length range)
Visual spot-check of a few structures

After validation

pLDDT > 0.85
ipTM > 0.50
PAE_interface < 10
scRMSD < 2.0 Å

Common Issues

Problem	Solution
Low ipTM	Review hotspot selection; try more designs
Low pLDDT	Use SolubleMPNN variant; check sequence
Poor diversity	Widen binder length range in YAML
All designs similar	Run multiple independent BoltzGen jobs