rfdiffusion

star 138

Generate protein backbones using RFdiffusion, a diffusion-based generative model for de novo protein structure generation. Use this skill when: (1) Designing binder scaffolds for a target protein, (2) Generating novel protein backbones from scratch, (3) Scaffolding functional motifs into new proteins, (4) Specifying hotspot residues for interface design, (5) Creating symmetric oligomers. For sequence design after backbone generation, use proteinmpnn. For structure validation, use alphafold or chai. For QC thresholds, use protein-qc.

adaptyvbio By adaptyvbio schedule Updated 6/11/2026

name: rfdiffusion description: > Generate protein backbones using RFdiffusion, a diffusion-based generative model for de novo protein structure generation. Use this skill when: (1) Designing binder scaffolds for a target protein, (2) Generating novel protein backbones from scratch, (3) Scaffolding functional motifs into new proteins, (4) Specifying hotspot residues for interface design, (5) Creating symmetric oligomers.

For sequence design after backbone generation, use proteinmpnn. For structure validation, use alphafold or chai. For QC thresholds, use protein-qc. license: MIT category: design-tools tags: [structure-design, diffusion, backbone, binder] proteinbase_slug: rfdiffusion proteinbase_url: https://proteinbase.com/design-methods/rfdiffusion

RFdiffusion Backbone Generation

Prerequisites

Requirement Minimum Recommended
Python 3.9+ 3.10
CUDA 11.7+ 12.0+
GPU VRAM 16GB 24GB (A10G)
RAM 16GB 32GB

How to run

RFdiffusion is not in biomodals, so run it from the official RosettaCommons repo or its Docker image, not through Modal.

Local installation (official repo)

git clone https://github.com/RosettaCommons/RFdiffusion.git
cd RFdiffusion

# Conda env including the required NVIDIA SE(3)-Transformer
conda env create -f env/SE3nv.yml
conda activate SE3nv
cd env/SE3Transformer && pip install . && cd ../..
pip install -e .

# Download weights (per-file hashed paths; see the repo README for the full list)
mkdir -p models
wget -P models http://files.ipd.uw.edu/pub/RFdiffusion/e29311f6f1bf1af907f9ef9f44b8328b/Complex_base_ckpt.pt

# Binder design run; single-quote the hydra args so the shell does not split [] or ,
./scripts/run_inference.py \
  inference.input_pdb=target.pdb \
  'contigmap.contigs=[A1-150/0 70-100]' \
  'ppi.hotspot_res=[A45,A67,A89]' \
  inference.num_designs=100

A RosettaCommons-maintained Docker image is also available from the repo README. After backbone generation, design sequences with proteinmpnn.

Config Schema (Hydra)

Contigmap Syntax

# De novo single chain (50-100 residues)
contigmap.contigs=[50-100]

# Binder + target (A = target chain, fixed with /0)
contigmap.contigs=[A1-150/0 70-100]

# Motif scaffolding (preserve residues, /0 = fixed)
contigmap.contigs=[20-40/0 A10-30/0 20-40]

# Multi-chain binder
contigmap.contigs=[A1-100/0 B1-100/0 60-80]

# Variable length ranges
contigmap.contigs=[A1-150/0 50-100]  # Binder 50-100 AA

Hotspot Specification

# Residues for interface (chain + resnum, no spaces)
ppi.hotspot_res=[A45,A67,A89]

Common mistakes

Contig Syntax

Correct:

'contigmap.contigs=[A1-150/0 70-100]'  # Target fixed (/0), binder variable

Single-quote the whole argument so the shell does not split on the space inside the brackets.

Wrong:

contigmap.contigs=[A1-150 70-100]     # Missing /0 - target will move!
contigmap.contigs=[A1-150/0 70-100]   # Unquoted: shell splits on the space
contigmap.contigs=[A1-150/0, 70-100]  # Extra comma changes the contig string

Hotspot Residues

Correct:

'ppi.hotspot_res=[A45,A67,A89]'      # Chain letter + residue number, whole arg quoted

Wrong:

ppi.hotspot_res=[45,67,89]           # Missing chain letter
'ppi.hotspot_res=[A45, A67, A89]'    # Spaces inside the list break parsing

Complete Parameter Reference

Core Parameters

Parameter Default Range Description
inference.num_designs 10 1-10000 Number of designs to generate
inference.input_pdb - path Target structure file
inference.output_prefix output string Output filename prefix
diffuser.T 50 20-200 Diffusion timesteps
denoiser.noise_scale_ca 1.0 0.0-2.0 CA atom noise (0.5-0.8 = conservative)
denoiser.noise_scale_frame 1.0 0.0-2.0 Frame noise
inference.ckpt_override_path - path Model checkpoint
potentials.guide_scale 1.0 0.1-10 Guidance strength
potentials.guide_decay constant string Decay type

Advanced Parameters

Parameter Default Description
diffuser.partial_T None Start diffusion from timestep T (partial diffusion)
contigmap.inpaint_str None Sequence positions to inpaint
scaffoldguided.scaffoldguided false Enable scaffold-guided generation
scaffoldguided.target_pdb None Scaffold template PDB
ppi.binderlen None Specify exact binder length

Symmetry Parameters

Parameter Default Description
symmetry.symmetry None Symmetry type (C2, C3, C4, D2, etc.)
symmetry.recenter true Recenter symmetric assembly
symmetry.radius None Radius constraint for symmetric assembly

Fold Conditioning

Parameter Default Description
contigmap.provide_seq None Provide sequence for fold conditioning
contigmap.inpaint_seq None Positions for sequence inpainting

Model Checkpoints

Checkpoint Use Case
Complex_base_ckpt.pt Binder design (default)
Base_ckpt.pt De novo monomers
ActiveSite_ckpt.pt Active site scaffolding
InpaintSeq_ckpt.pt Sequence inpainting

Common workflows

Binder Design

  1. Prepare target PDB (trim to binding region + 10A buffer)
  2. Identify 3-6 hotspot residues (exposed, conserved)
  3. Generate 100-500 backbones
  4. Pass to proteinmpnn for sequence design

Motif Scaffolding

  1. Extract motif coordinates
  2. Use /0 to fix motif in contigmap
  3. Generate surrounding scaffold
  4. Validate motif preservation (RMSD < 1.5A)

Symmetric Oligomers

# C3 symmetric trimer
python run_inference.py \
  symmetry.symmetry=C3 \
  contigmap.contigs=[100-150] \
  inference.num_designs=50

# D2 symmetric tetramer
python run_inference.py \
  symmetry.symmetry=D2 \
  contigmap.contigs=[80-120] \
  symmetry.radius=25

# Supported symmetries: C2, C3, C4, C5, C6, D2, D3, D4, tetrahedral, octahedral

Partial Diffusion (Refinement)

# Start from existing structure, diffuse from timestep 10
python run_inference.py \
  inference.input_pdb=initial.pdb \
  diffuser.partial_T=10 \
  contigmap.contigs=[A1-100]

Output format

output/
├── output_0.pdb       # Generated backbone
├── output_1.pdb
├── ...
└── output_99.pdb

Each PDB contains polyalanine backbone - use proteinmpnn for sequence.

Sample output

Successful run

$ python run_inference.py inference.input_pdb=target.pdb contigmap.contigs=[A1-150/0 70-100] inference.num_designs=100
[INFO] Loading model from Complex_base_ckpt.pt
[INFO] Generating design 1/100...
[INFO] Generating design 50/100...
[INFO] Generating design 100/100...
[INFO] Saved 100 designs to output/

Generated:
output/output_0.pdb (85 residues)
output/output_1.pdb (92 residues)
...

What good output looks like:

  • File size: 3-8 KB per PDB (backbone only)
  • Residue count within specified range
  • Secondary structure visible in PyMOL (helices/sheets, not random coil)

Decision tree

Should I use RFdiffusion?
│
├─ Need to generate protein backbone?
│  ├─ Yes → Continue below
│  └─ No, already have backbone → Use ProteinMPNN
│
├─ What type of design?
│  ├─ Binder for protein target → RFdiffusion ✓
│  ├─ De novo monomer → RFdiffusion ✓
│  ├─ Motif scaffolding → RFdiffusion ✓
│  └─ Symmetric assembly → RFdiffusion ✓
│
└─ Priority?
   ├─ Need highest success rate → Consider BindCraft
   ├─ Need diversity/exploration → RFdiffusion ✓
   └─ Need all-atom precision → Consider BoltzGen

Typical performance

Campaign Size Time (A10G) Cost (Modal) Notes
100 backbones 20-30 min ~$3 Quick exploration
500 backbones 1.5-2h ~$12 Standard campaign
1000 backbones 3-4h ~$25 Large campaign

Expected downstream yield: ~10-15% of backbones pass full QC after sequence design + validation.

Adaptyv's own tests of these models showed an RFdiffusion + sequence-design pipeline costing about $0.25 per accepted design, averaged across 7 targets, among the cheapest of the methods tested.


Verify

ls output/*.pdb | wc -l  # Should match num_designs

Troubleshooting

Designs lack secondary structure: Decrease noise_scale to 0.5-0.8 Binder not contacting hotspots: Verify residue numbering, increase num_designs OOM errors: Reduce batch size or use A100 GPU Slow generation: Reduce diffuser.T to 25-35

Error interpretation

Error Cause Fix
RuntimeError: CUDA out of memory GPU VRAM exceeded Use A100 or reduce designs per batch
KeyError: 'A' Chain not found in PDB Check chain IDs with grep ^ATOM target.pdb | cut -c22 | sort -u
ValueError: invalid contig Syntax error in contigs Check for spaces, quotes, commas (see Common Mistakes)
FileNotFoundError: ckpt Missing model weights Download from IPD website

Next: proteinmpnn for sequence design → structure prediction for validation → protein-qc for filtering.

Install via CLI
npx skills add https://github.com/adaptyvbio/protein-design-skills --skill rfdiffusion
Repository Details
star Stars 138
call_split Forks 16
navigation Branch main
article Path SKILL.md
More from Creator