name: pdb2reaction-architecture
description: Where the source code lives in pdb2reaction. 6 physical layer directories (cli / workflows / domain / backends / io / core) + 2 repo-internal forks (pysisyphus / thermoanalysis). Tells an agent which directory to grep for a given concern (Click option, stage runner, MLIP backend, output writer, chemistry default, link-atom math) before touching code. TRIGGER on questions like "where is X implemented", "which file defines flag Y", "how is the repo organised", "what's safe to refactor". SKIP for usage questions — those belong to pdb2reaction-cli / -overview.
pdb2reaction architecture (one-screen map)
6 layers + bundled forks
pdb2reaction/ ← the package body, one folder per layer
├── cli/ # L1 — Click root group, --help-advanced, bool normalisation,
│ # shared option-decorator factories, subcommand resolver.
├── workflows/ # L2 — one file per CLI subcommand (`all.py`, `tsopt.py`,
│ # `freq.py`, `irc.py`, `dft.py`, `extract.py`, ...).
├── domain/ # L3 — chemistry-aware helpers (bond changes, bond summary,
│ # element-info repair). May use torch/numpy; no MLIP SDK
│ # (fairchem/orb_models/mace/aimnet) dependency.
├── backends/ # L4a — MLIP backend dispatcher + per-backend adapters
│ # (`uma.py`, `orb.py`, `mace.py`, `aimnet2.py`) +
│ # xTB ALPB solvent correction.
├── io/ # L4b — summary writer, energy diagram, trajectory plot,
│ # Hessian cache, PDB altloc fix.
└── core/ # L5 — `defaults.py` (primary source of truth for most CLI
# default), `utils.py` (PDB / XYZ / plot helpers),
# `logging.py`.
pysisyphus/ ← bundled fork of the optimiser / TS / IRC engine.
Slimmed to the subset pdb2reaction actually uses; see its
own README for the 5 divergent files (chemistry-rule
load-bearing). Annotation-only edits in normal workflow.
thermoanalysis/ ← bundled fork for ΔG / ZPE / partition functions.
QCData.py is the only consumer; same touch restriction
as pysisyphus.
Dependency direction is one-way: L1 → L2 → {L3, L4} → L5. The bundled forks sit outside the layer graph and may be imported from any layer.
Where to look first
| concern | open this |
|---|---|
| Default for any CLI flag | pdb2reaction/core/defaults.py (primary source for most defaults — grep here first; a few workflow-local defaults live inline, e.g. path-opt --mep-mode) |
| Subcommand body / orchestration | pdb2reaction/workflows/<subcmd>.py |
| New MLIP backend | pdb2reaction/backends/<backend>.py + register in backends/__init__.py:BACKEND_REGISTRY |
--help / option decorator |
pdb2reaction/cli/common_options.py (shared) or the subcommand file (inline) |
| Output schema (summary.json, trajectory, energy diagram) | pdb2reaction/io/ |
| Chemistry rule (link-atom, ECP injection, scatter) | search # CHEMISTRY-RULE: markers (lab-sign-off required to edit) |
| TS / IRC / optimiser internals | pysisyphus/ (annotation-only — chemistry-rule risk) |
| MCP server / agent integration | pdb2reaction/mcp/ — see pdb2reaction-mcp |
Hidden constraints to remember
pdb2reaction/cli/app.py:_LAZY_SUBCOMMANDSentries MUST use absolute module paths ("pdb2reaction.workflows.all", never".all"). Relative dotted paths silently break the resolver ifdefault_group.pymoves.- VRAM hygiene:
# DO NOT INLINEmarkers arounddel calc; gc.collect(); torch.cuda.empty_cache()between stages are load-bearing — removing them OOMs the next stage on full-protein systems. pyproject.toml [tool.setuptools.packages.find].includeanddependenciesarrays are treated as 0-diff for this release line. Adding a vendor / internal dir or pinning a new runtime dep breaks behaviour-level guarantees and is out of scope.- Bundled-fork edits to
pysisyphus//thermoanalysis/outside the 5 divergent files require[CHEMISTRY-RULE:N]commit prefix and a HEAVY benchmark.
See also
- Full architecture (~320 lines):
docs/architecture.md - Contributor recipe + per-step gate cycle:
CONTRIBUTING.md - Engineering-marker coverage check:
.github/scripts/check_engineering_markers.py