protein-qc - SKILL.md Agent Skill

name: protein-qc description: > Quality control metrics and filtering thresholds for protein design. Use this skill when: (1) Evaluating design quality for binding, expression, or structure, (2) Setting filtering thresholds for pLDDT, ipTM, PAE, (3) Checking sequence liabilities (cysteines, deamidation, polybasic clusters), (4) Creating multi-stage filtering pipelines, (5) Computing PyRosetta interface metrics (dG, SC, dSASA), (6) Checking biophysical properties (instability, GRAVY, pI), (7) Ranking designs with composite scoring. license: MIT category: evaluation tags: [qc, filtering, metrics, thresholds] source: https://github.com/adaptyvbio/protein-design-skills

Protein Design Quality Control

Critical Limitation

Individual metrics have weak predictive power for binding:

Individual metric ROC AUC: 0.64-0.66 (slightly better than random)
Metrics are pre-screening filters, not affinity predictors
Composite scoring is essential for meaningful ranking

Quick Reference: All Thresholds

Category	Metric	Standard	Stringent
Structural	pLDDT	> 0.85	> 0.90
	pTM	> 0.70	> 0.80
	scRMSD	< 2.0 Å	< 1.5 Å
Binding	ipTM	> 0.50	> 0.60
	PAE_interaction	< 12 Å	< 10 Å
	interface_dG	< -10	< -15
Expression	Instability	< 40	< 30
	GRAVY	< 0.4	< 0.2
	ESM2 PLL	> 0.0	> 0.2

Design-Level Checks

Pattern	Risk	Action
Odd cysteine count	Unpaired disulfides	Redesign
NG/NS/NT motifs	Deamidation	Flag/avoid
K/R ≥ 3 consecutive	Proteolysis	Flag
≥ 6 hydrophobic run	Aggregation	Redesign

Sequential Filtering Pipeline

import pandas as pd

designs = pd.read_csv('designs.csv')

# Stage 1: Structural confidence
designs = designs[designs['pLDDT'] > 0.85]

# Stage 2: Self-consistency
designs = designs[designs['scRMSD'] < 2.0]

# Stage 3: Binding quality
designs = designs[(designs['ipTM'] > 0.5) & (designs['PAE_interaction'] < 10)]

# Stage 4: Sequence plausibility
designs = designs[designs['esm2_pll_normalized'] > 0.0]

# Stage 5: Expression checks
designs = designs[designs['cysteine_count'] % 2 == 0]
designs = designs[designs['instability_index'] < 40]

Composite Scoring

def composite_score(row):
    return (
        0.30 * row['pLDDT'] +
        0.20 * row['ipTM'] +
        0.20 * (1 - row['PAE_interaction'] / 20) +
        0.15 * row['shape_complementarity'] +
        0.15 * row['esm2_pll_normalized']
    )

designs['score'] = designs.apply(composite_score, axis=1)
top_designs = designs.nlargest(100, 'score')

Metric AUC Comparison

Metric	AUC
ipTM	~0.64
PAE	~0.65
ESM2 PLL	~0.72
Composite	~0.75+

Campaign Health

Pass Rate	Status
> 15%	Excellent
10-15%	Good
5-10%	Marginal
< 5%	Poor — diagnose

Failure Recovery

Low pLDDT (< 5% passing):

Regenerate backbones with noise_scale=0.5-0.8
Use SolubleMPNN instead of ProteinMPNN
Try BindCraft (integrated design)

Low ipTM (< 5% passing):

Review hotspot selection (surface-exposed, conserved)
Increase binder length (more contact area)
Try BoltzGen for all-atom precision

High scRMSD (> 50%):

Lower ProteinMPNN temperature to 0.1
Increase sequences per backbone to 32
Regenerate backbones with lower noise_scale