launch-with-slurm

star 185

Reference for launching jobs inside a SLURM allocation via srun (single-node or multi-node). Use whenever $SLURM_JOB_ID is set and work needs to run on the allocated compute — from direct user requests ("run on the cluster", "launch on slurm", "train across N nodes", "dispatch the job") OR from within another skill's workflow (e.g., validate-correctness running validation on the allocation, add-new-model reaching pp=2/ep=2). Covers how to read the allocation context from $SLURM_JOB_ID via scontrol, the srun flags that matter (-W 0, -N, -o, --open-mode, --nodelist), and gotchas like the executable bit requirement and distributed-aware output redirection.

mlc-ai By mlc-ai schedule Updated 4/23/2026

Skill instructions (SKILL.md) could not be loaded from local cache or raw GitHub repository.

Install via CLI
npx skills add https://github.com/mlc-ai/pith-train --skill launch-with-slurm
Repository Details
star Stars 185
call_split Forks 13
navigation Branch main
article Path SKILL.md
More from Creator