name: mlmm-structure-io
description: PDB / XYZ / GJF / Amber parm7+rst7 input-file reference for mlmm-toolkit, plus the charge / multiplicity decision workflow and the B-factor layer encoding (ML=0 / movable-MM=10 / frozen-MM=20) that defines a three-layer ONIOM system from a single prepared PDB. TRIGGER on editing or inspecting a structure file, deciding -q / -l / -m, interpreting residue / charge / spin in an input, or assigning B-factor layers. SKIP for subcommand syntax, output parsing, install, or HPC questions.
mlmm-toolkit Structure I/O
Purpose
mlmm-toolkit reads four formats; each carries different information
and is preferred for different stages:
| Format | Carries | Preferred for |
|---|---|---|
| PDB | atom name, residue, chain, occupancy, B-factor (layer label), element | Initial input; B-factor encodes ML / movable-MM / frozen layer |
| XYZ | element + Cartesian coordinates only | Trajectories, post-IRC outputs, single-stage exchange between subcommands |
| GJF | element + coords + charge / spin / route line | Round-tripping with Gaussian; mlmm oniom-{export,import} |
| parm7 / rst7 | Amber topology + coordinate pair | MM region force-field parameters; output of mlmm mm-parm |
PDB / XYZ / GJF use Å for coordinates and conventional element symbols.
parm7 is the Amber topology format (text but byte-aligned).
Per-format details:
| File | Topic |
|---|---|
pdb.md |
PDB column-by-column layout, residue selectors, B-factor layer encoding (0.0=ML / 10.0=movable-MM / 20.0=frozen), link-H placement |
xyz.md |
XYZ format, ASE extension comment line |
gjf.md |
Gaussian gjf header (%link0 → route → charge spin → coords) |
parm7.md |
Amber parm7 topology + rst7 coordinates (mlmm-specific) |
charge-multiplicity.md |
Deciding -q and -m for an unfamiliar substrate (literature lookup workflow) |
Decision tree: which format to feed mlmm-toolkit
Is the input the full enzyme + parm7 you'll run ML/MM on?
└── PDB (full enzyme, with B-factor layer assignment) + parm7
→ all of opt / tsopt / scan / path-search / freq / irc / dft
Is the input a single TS candidate to validate?
└── XYZ + --ref-pdb (full enzyme PDB) + --parm
→ tsopt / freq / irc / all (TS-only mode)
Is the input a Gaussian g16 ONIOM input you want to import?
└── GJF → mlmm oniom-import → reconstructs PDB + extracts layer info
Do you need a parm7 / rst7 from a raw enzyme PDB?
└── mlmm mm-parm → PDB + AmberTools tleap → parm7 + rst7
ML/MM-aware CLI conventions
Most subcommands take --parm FILE (the parm7) plus one of:
--detect-layer(default on) — read the layer assignment from the input PDB's B-factor field--model-pdb FILE— explicit ML-region PDB--model-indices '1-50,75,100-110'— explicit atom-index list
When -i is XYZ, also pass --ref-pdb so atom ordering and residue
context are recoverable.
Editing approach (agent-side)
When an agent must edit a structure file:
- Read the file first to understand current layout (residues, atom counts, B-factor layer assignment, charge/multiplicity if present).
- Identify the change and confirm it does not violate format conventions (PDB column widths, XYZ first-line atom count, parm7 byte alignment).
- For unknown charge / multiplicity values, confirm with the user
or do a literature lookup before guessing — see
charge-multiplicity.mdfor the workflow. - For layer-assignment changes (B-factor edits), use
mlmm define-layerrather than hand-editing if possible.
Subcommand × format compatibility
| Subcommand | PDB | XYZ | GJF | parm7 |
|---|---|---|---|---|
extract |
✓ (in/out) | — | — | — |
mm-parm |
✓ (in) | — | — | ✓ (out) |
define-layer |
✓ (in/out) | — | — | — |
path-search / path-opt |
✓ | ✓ | ✓ | ✓ |
opt / tsopt / freq / irc |
✓ | ✓ | ✓ | ✓ |
dft |
✓ | ✓ | ✓ | ✓ |
scan / scan2d / scan3d |
✓ | ✓ (with --ref-pdb) |
— | ✓ |
oniom-export |
✓ (in) | ✓ (in) | ✓ (out) | ✓ (in) |
oniom-import |
✓ (out) | — | ✓ (in) | — |
bond-summary |
✓ | ✓ | ✓ | — |
Quick reference
PDB ATOM/HETATM record (cols 1-based, inclusive)
name(13-16) altloc(17) resName(18-20) chainID(22)
resSeq(23-26) X(31-38) Y(39-46) Z(47-54)
occupancy(55-60) bfactor(61-66, used as layer: 0.0/10.0/20.0)
element(77-78)
XYZ line 1: <natoms>
line 2: <comment, optional ASE Properties=…>
line 3+: <element> <x> <y> <z>
GJF %nproc=... %mem=...
# <route line: functional/basis options>
<title>
<charge> <spin>
<element> <x> <y> <z>
...
parm7 Amber topology — generate with `mlmm mm-parm`; do not hand-edit.
Pair with rst7 (coordinate snapshot).
Full byte-by-byte / per-keyword detail in the per-format mds.
Charge / multiplicity defaults
-m 1(singlet, closed shell) is the default for almost every organic / biological / metal-coordination cluster.- Use
-m 2for radicals,-m 3+for unusual high-spin metals. -qis the ML region charge.-l 'RES:Q'derives-qfrom per-residue charges +mlmm's internal amino-acid table.- For XYZ inputs (no header),
-qand-mmust be on the CLI.
If unsure about charge or spin, do not guess silently — follow
charge-multiplicity.md.
See also
mlmm-cli/extract.md,mm-parm.md,define-layer.md— pre-pipeline.mlmm-cli/SKILL.md— common flag conventions across subcommands.mlmm-workflows-output/SKILL.md— what comes out of the pipeline (XYZ / PDB).