name: boltz-predict description: "AI-driven protein structure prediction using Boltz-2 for single proteins, multimers, and protein-ligand complexes."
Boltz Predict
You are a computational biophysics expert helping users predict protein structures using Boltz-2.
Read skills/common/preamble.md, skills/common/tool-output.md, and
skills/common/node-cli-patterns.md before acting.
Prediction Modes
Boltz-2 supports three prediction scenarios:
| Mode | Input | Example |
|---|---|---|
| Single protein | 1 protein sequence | MVLSPADKTNV... |
| Protein-protein complex | 2+ protein sequences | SEQ1, SEQ2 (for dimer, trimer, etc.) |
| Protein-ligand complex | 1 protein sequence + SMILES | Protein: MVLSPAD..., Ligand: CCO (ethanol) |
Use this skill when prepare_complex or clean_protein returns
code="pdbfixer_missing_residues_out_of_scope" and no reliable MODELLER
template/alignment is available. Regenerate a new source candidate from the
sequence instead of retrying PDBFixer repair on the same incomplete structure.
Step 0: Parse and Confirm
Before executing, extract parameters from the user's request and identify the mode.
Present a confirmation table:
| Parameter | Value |
|---|---|
| Mode | (Single / Protein-Protein / Protein-Ligand) |
| Protein sequence(s) | (amino acids in single-letter code) |
| Ligand (if protein-ligand) | (SMILES or chemical name) |
| MSA | (Server / File path) |
| Affinity prediction | (yes / no — protein-ligand mode only) |
Ask for clarification if any parameter is missing or ambiguous.
Step 1: Ligand SMILES Preparation (Protein-Ligand Mode Only)
If user provides a chemical name (e.g., "aspirin", "ibuprofen"):
mdclaw pubchem_get_smiles_from_name --chemical-name "aspirin"
If this returns success: True, use the returned SMILES. If it fails, ask the user to provide the SMILES directly or check the compound name spelling.
If user provides SMILES directly:
mdclaw rdkit_validate_smiles --smiles "CCO"
Always validate SMILES before prediction. If validation fails, show the error to the user and ask for correction.
Step 2: MSA, Affinity, and Model Generation Options
MSA (Multiple Sequence Alignment)
Ask the user:
- Option A (Default): Use Boltz-2 MSA server (recommended for most cases)
- Option B: Provide MSA file path (for custom alignments)
Affinity Prediction (Protein-Ligand Mode Only)
Ask: "Do you want to predict binding affinity for the ligand?"
- Yes: Pass
--affinity - No: Omit the flag, or pass
--no-affinityexplicitly (faster, structure-only)
Number of Models
Ask: "How many structure models do you want to generate?"
- Default (1): Single best-effort model — fastest
- Multiple (e.g., 3-5): Ensemble of diverse candidates — useful for ranking or selecting different conformations
Pass
--num-models Nto request N models
If the user wants a custom MSA file, note that the current mdclaw tool
accepts a single --msa-path value and is best suited to single-protein inputs.
For multi-protein custom MSA workflows, ask the user to fall back to the MSA
server unless they explicitly want to prepare Boltz YAML by hand.
Step 3: Run Boltz-2 Prediction
Example: Single Protein
mdclaw boltz2_protein_from_seq \
--amino-acid-sequence-list "MVLSPADKTNVKAAW..."
Example: Protein-Protein Complex (Dimer)
mdclaw boltz2_protein_from_seq \
--amino-acid-sequence-list "MVLSPADKTNV..." "MKVLPAD..."
Example: Protein-Ligand Complex with MSA Server
mdclaw boltz2_protein_from_seq \
--amino-acid-sequence-list "MVLSPADKTNVKAAW..." \
--smiles-list "CCO" \
--affinity
Example: Protein-Ligand with Custom MSA File
mdclaw boltz2_protein_from_seq \
--amino-acid-sequence-list "MVLSPADKTNVKAAW..." \
--smiles-list "CCO" \
--msa-path "/path/to/alignment.a3m" \
--affinity
Example: Multiple Models Ensemble
mdclaw boltz2_protein_from_seq \
--amino-acid-sequence-list "MVLSPADKTNVKAAW..." \
--smiles-list "CCO" \
--affinity \
--num-models 3
Key parameters:
--amino-acid-sequence-list: One or more sequences in single-letter format- Multiple sequences = complex (dimer, trimer, etc.)
- Use exactly as provided by user
--smiles-list: SMILES strings for ligands- Omit for protein-only predictions
- Should be pre-validated with
rdkit_validate_smiles
--msa-path: Optional custom MSA file path- Omit it to use the Boltz MSA server
- Current
mdclawwrapper supports one shared custom MSA path, so this is best for single-protein inputs
--affinity: Boolean flag (only for protein-ligand mode)- Use
--affinityto enable - Omit it, or use
--no-affinity, to disable
- Use
--num-models: Number of structure candidates to generate (default: 1)- Single value (1) = fastest, recommended for initial screening
- Multiple values (3-5) = ensemble for structural diversity
Step 4: Result Interpretation and Handoff
The tool returns:
success: bool — True if prediction completedjob_id: str — Unique identifieroutput_dir: str — Path to results directorypredicted_pdb_files: list — Paths to predicted PDB/mmCIF structures- Collected from the Boltz output directory
- Sorted by Boltz model index when filenames contain
_model_N - Multiple files returned if
--num-models > 1
confidence_scores: dict — Confidence JSON content when Boltz writes itaffinity_scores: dict (if--affinity) — Contains:affinity_probability_binary: Higher = more confident bindingaffinity_pred_value: Lower = stronger predicted binding; reported aslog10(IC50)with IC50 inuM
warnings: list — Non-critical warnings
Source-node metadata
When job_dir and node_id point to a source node, the Boltz output is
normalized into the standard source bundle:
nodes/<source_node_id>/artifacts/source_bundle.json
nodes/<source_node_id>/artifacts/candidates/<candidate_id>.pdb
Per-candidate Boltz information belongs in source_bundle.json, not only in
the source node's run-level metadata:
origin.boltz_rank: one-based candidate rank in the returned Boltz orderorigin.boltz_model_index: zero-based_model_Nvalue when presentorigin.boltz_output_file: original Boltz prediction fileorigin.confidence_file: matching Boltz confidence JSON when presentmetrics.confidence_score: copied from the confidence JSON for quick rankingmetrics.confidence: full confidence JSON content for provenance
Run-level details such as num_models_requested, boltz_output_dir,
input_yaml, sequences, smiles_list, and optional affinity scores remain
in the source node metadata.
List candidates through the tool instead of asking the user to open JSON:
mdclaw list_source_candidates --job-dir <job_dir> --node-id <source_node_id>
For normal MDClaw DAG work, run Boltz-2 in node mode so the prediction becomes the job's source bundle:
mdclaw create_node --job-dir <job_dir> --node-type source
mdclaw --job-dir <job_dir> --node-id <source_node_id> boltz2_protein_from_seq \
--amino-acid-sequence-list "MVLSPADKTNVKAAW..." \
--num-models 3
For a protein-ligand prediction:
mdclaw --job-dir <job_dir> --node-id <source_node_id> boltz2_protein_from_seq \
--amino-acid-sequence-list "MVLSPADKTNVKAAW..." \
--smiles-list "CCO" \
--affinity
Next Steps
Present the candidate IDs and confidence scores to the user. Use the default candidate for a simple first MD setup, or prepare multiple jobs/candidates when the scientific question needs ensemble comparison.
If they want to continue to MD simulation:
"To set up MD simulation with the predicted structure, create the prep node first (its parent auto-resolves to the source), then run
prepare_complexwith the node id it returns:mdclaw create_node --job-dir <job_dir> --node-type prep mdclaw --job-dir <job_dir> --node-id <prep_node_id> prepare_complex \ --source-structure-id <candidate_id>If your harness provides slash commands,
/md-prepareis the interactive shortcut for the same preparation skill."
Error Handling
| Issue | Action |
|---|---|
| SMILES validation fails | Ask user to check chemical name or provide corrected SMILES |
| PubChem lookup fails | Ask user to provide SMILES directly |
boltz_sequence_required |
Ask for at least one amino-acid sequence |
boltz_num_models_invalid |
Use --num-models 1 or a larger positive integer |
boltz_affinity_requires_ligand |
Provide at least one valid ligand SMILES or omit --affinity |
boltz_msa_file_missing |
Verify the MSA path or omit --msa-path to use the MSA server |
boltz_custom_msa_multimer_unsupported |
Use the MSA server for multimers or prepare Boltz YAML manually |
boltz_chain_count_exceeded |
Split the prediction or reduce the number of protein/ligand chains |
boltz_executable_not_found |
Stop local execution and report that Boltz-2 is unavailable in the runtime |
boltz_execution_failed |
Report the structured error and check sequence/SMILES/MSA inputs |
boltz_no_structure_output |
Treat as a failed prediction; do not continue to prep without a source candidate |
boltz_source_attach_failed |
Preserve the Boltz output directory and repair source-bundle registration before continuing |