name: mace-finetuning-and-benchmark
description: Use this skill for remote MACE fine-tuning/training plus held-out evaluation using the validated reference-script conventions, especially the mace-mh-1 + omat_pbe replay-style path with explicit E0 and replay controls.
mace-finetuning-and-benchmark
Overview
Use this skill to train or fine-tune a MACE model on a prepared dataset while matching the validated reference_scripts/mace_training_example behavior instead of inventing ad hoc MACE CLI settings.
Quick Start
- Start from a dataset directory that already contains explicit split files.
- For the validated baseline, use
foundation_model=mh-1or an explicit staged/local foundation-model path, andfoundation_head=omat_pbe. - Launch training by preparing the
mace_train_dirstage layout and callingremote_submission, choosinge0s="estimated"or a fixed E0 JSON path explicitly in the staged params. - Use
mace_eval_dirthroughremote_submissiononly when you need an extra post-training benchmark pass; the training run itself may already includetest.extxyz. - Do not replace the managed remote training task with a local training wrapper when the catalog task already fits the job.
Allowed tools
get_avail_remote_taskremote_submission
Workflow
1. Keep dataset provenance fixed
- Do not retrain on a moving dataset root while comparing hyperparameters or base models.
- Keep the split file names and the exact foundation-model/head choice explicit.
2. Train as one remote job
- The training stage must contain the dataset, optional foundation model, optional E0 JSON, optional replay/statistics/local CLI assets, and
params/train_params.json. - Report stage-local outputs plus receipt/context fields, not just that the submission launched.
- The reference-validated route is replay-style finetuning with explicit
foundation_head,multiheads_finetuning,pt_train_file, replay sampling knobs, and explicit loss weights. - If the user did not ask for a custom ablation, keep the validated baseline explicit:
mh-1,omat_pbe, andestimated/fixed estimated E0s. Do not silently swap to another foundation model or another head. - When the user needs additional official MACE CLI knobs beyond the common first-class fields, pass them through
cli_argsrather than writing a local wrapper script.
3. Benchmark separately
- The training run can already carry
test.extxyz; usemace_eval_dirwhen you need an additional benchmark pass on a retained checkpoint or an alternate split. - Keep the evaluation output root separate from the training root.
- Choose the evaluation device explicitly when the remote resource is not guaranteed to expose CUDA.
4. Use this skill once the workflow artifact is a dataset or model
- Start from a prepared dataset directory, a checkpoint to benchmark, or an explicit model-comparison plan.
- If the training run identifies new structures that need relabeling or new reference calculations, hand those artifacts back into the materials-side workflow before the next dataset rebuild.
Method-critical defaults
- The validated baseline in this repo is
mace-mh-1withfoundation_head=omat_pbe, explicit replay controls,compute_stress=True,energy_weight=1.0,forces_weight=10.0,stress_weight=1.0,default_dtype=float32,batch_size=4as the conservative starting point, andseed=42. - For typical fine-tuning runs in this workflow, use an epoch cap in the
15-25range as the default starting band. Keep25as the normal upper cap unless the user explicitly asks for a longer ablation, and prefer the best validation checkpoint over blindly extending epochs. - Prefer the managed remote task path for that baseline. Only fall back to custom local scripts when the requested workflow is genuinely outside what
mace_train_dir/mace_eval_dircan express. - Surface the foundation-model choice, head, E0 strategy, replay controls, batch size, learning rate, and epoch cap when they differ across runs.
- Treat benchmark coverage honestly: the evaluator reports energy/force metrics, and reports stress metrics only when reference stress is present in the dataset and the model/calculator exposes stress.
- Do not compare metrics across different train/valid/test splits as if they came from the same benchmark.
- Keep the training artifact chain explicit: dataset inputs, checkpoint outputs, and benchmark reports should remain separately identifiable.
Output Contract
Return:
- training output root
- evaluation output root
remote_context_id,submission_hash,receipt_rel, andtask_state_countswhen present- model artifact path(s)
- metrics JSON / per-config CSV path(s)
References
- Use
mace-dataset-curationfirst when the dataset root has not yet been built from VASP outputs. - Reference flow: vasp_to_mace_finetune.md
- Validated training command: run_train.sh