name: start-run
description: How to launch prime-rl training runs — the rl, sft, and inference entrypoints, their config classes, and single-node/SLURM/dry-run modes. Use when starting a run or picking the right entrypoint.
Start a run
All entrypoints run via uv run <command> and accept TOML configs via @ path/to.toml plus CLI overrides.
Config system at a glance
pydantic-config — Pydantic-based TOML + CLI loader. Highlights (see the configs skill for full mechanics):
- Config files via
@ path(TOML / YAML / JSON); CLI args layer on top, deep-merged with class defaults. - Nested groups via dotted CLI paths — kebab-case on the CLI, snake_case in TOML.
- Bool toggles: bare
--flagenables,--no-flagdisables (nested too). - Lists: space-separated or JSON literal. Dicts: JSON literal, deep-merged with file values.
- Optional sub-configs (
WandbConfig | None): bare--wandbenables defaults;--wandb @ wandb.tomlenables from a file;--no-wandbdisables. - Discriminated unions are switched by the
typetag (e.g.--optimizer.type muon). - Validation aliases let renamed fields keep working; legacy keys can be remapped in a
model_validator(mode="before"). - Auto-generated
--helppanels fromField(description=...)or PEP 224 docstrings. - Friendly errors: required-field boxes, validator errors point at the offending flag, unknown flags get a "did you mean" hint.
rl — RL training
Launches inference server, orchestrator, and trainer as subprocesses.
uv run rl @ examples/reverse_text/rl.toml
uv run rl @ examples/reverse_text/rl.toml @ examples/reverse_text/slurm_rl.toml # SLURM
uv run rl @ examples/reverse_text/rl.toml --dry-run # write scripts, don't run
- Config:
RLConfig(packages/prime-rl-configs/src/prime_rl/configs/rl.py) - Entrypoint:
src/prime_rl/entrypoints/rl.py - SLURM: single- and multi-node
- Environment packages: before launching a config with a non-core verifier env id,
verify the package imports under
uv run(for exampleuv run python -c "import importlib.util; print(importlib.util.find_spec('rlm_swe'))"). If a local env exists underdeps/research-environments/environments/but does not import, add it to the rootpyproject.tomlenv extra, workspace members, and[tool.uv.sources], then runuv sync --all-extras.
sft — SFT training
Launches torchrun internally — never call torchrun directly.
uv run sft @ examples/reverse_text/sft.toml
uv run sft @ examples/reverse_text/sft.toml --slurm
uv run sft @ examples/reverse_text/sft.toml --dry-run
- Config:
SFTConfig(packages/prime-rl-configs/src/prime_rl/configs/sft.py) - Entrypoint:
src/prime_rl/entrypoints/sft.py - SLURM: single- and multi-node
inference — vLLM server
OpenAI-compatible API plus prime-rl custom endpoints (/update_weights, /load_lora_adapter, /init_broadcaster). Always use this entrypoint — never vllm serve directly.
uv run inference @ configs/debug/infer.toml
uv run inference --model.name Qwen/Qwen3-0.6B --model.enforce-eager
Smoke checks:
curl http://<host>:<port>/health
curl http://<host>:<port>/v1/models
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "Qwen/Qwen3-0.6B", "messages": [{"role": "user", "content": "Hi"}], "max_tokens": 50}'
- Config:
InferenceConfig(packages/prime-rl-configs/src/prime_rl/configs/inference.py) - Entrypoint:
src/prime_rl/entrypoints/inference.py - SLURM: single-node, multi-node, and disaggregated deployments
Summary
| Command | Purpose | Typical use |
|---|---|---|
rl |
Full RL pipeline | Production RL training |
sft |
Supervised fine-tuning | SFT and hard-distill |
inference |
vLLM server | Standalone serving / debugging |
Key paths
src/prime_rl/entrypoints/—rl,sft,inference(+trainer,orchestratorfor direct launches)packages/prime-rl-configs/src/prime_rl/configs/— all config classesconfigs/debug/— minimal debug configsexamples/— full example configs (e.g.reverse_text/)