start-run

star 1.5k

How to launch prime-rl training runs — the `rl`, `sft`, and `inference` entrypoints, their config classes, and single-node/SLURM/dry-run modes. Use when starting a run or picking the right entrypoint.

PrimeIntellect-ai By PrimeIntellect-ai schedule Updated 6/3/2026

name: start-run description: How to launch prime-rl training runs — the rl, sft, and inference entrypoints, their config classes, and single-node/SLURM/dry-run modes. Use when starting a run or picking the right entrypoint.

Start a run

All entrypoints run via uv run <command> and accept TOML configs via @ path/to.toml plus CLI overrides.

Config system at a glance

pydantic-config — Pydantic-based TOML + CLI loader. Highlights (see the configs skill for full mechanics):

  • Config files via @ path (TOML / YAML / JSON); CLI args layer on top, deep-merged with class defaults.
  • Nested groups via dotted CLI paths — kebab-case on the CLI, snake_case in TOML.
  • Bool toggles: bare --flag enables, --no-flag disables (nested too).
  • Lists: space-separated or JSON literal. Dicts: JSON literal, deep-merged with file values.
  • Optional sub-configs (WandbConfig | None): bare --wandb enables defaults; --wandb @ wandb.toml enables from a file; --no-wandb disables.
  • Discriminated unions are switched by the type tag (e.g. --optimizer.type muon).
  • Validation aliases let renamed fields keep working; legacy keys can be remapped in a model_validator(mode="before").
  • Auto-generated --help panels from Field(description=...) or PEP 224 docstrings.
  • Friendly errors: required-field boxes, validator errors point at the offending flag, unknown flags get a "did you mean" hint.

rl — RL training

Launches inference server, orchestrator, and trainer as subprocesses.

uv run rl @ examples/reverse_text/rl.toml
uv run rl @ examples/reverse_text/rl.toml @ examples/reverse_text/slurm_rl.toml   # SLURM
uv run rl @ examples/reverse_text/rl.toml --dry-run                                # write scripts, don't run
  • Config: RLConfig (packages/prime-rl-configs/src/prime_rl/configs/rl.py)
  • Entrypoint: src/prime_rl/entrypoints/rl.py
  • SLURM: single- and multi-node
  • Environment packages: before launching a config with a non-core verifier env id, verify the package imports under uv run (for example uv run python -c "import importlib.util; print(importlib.util.find_spec('rlm_swe'))"). If a local env exists under deps/research-environments/environments/ but does not import, add it to the root pyproject.toml env extra, workspace members, and [tool.uv.sources], then run uv sync --all-extras.

sft — SFT training

Launches torchrun internally — never call torchrun directly.

uv run sft @ examples/reverse_text/sft.toml
uv run sft @ examples/reverse_text/sft.toml --slurm
uv run sft @ examples/reverse_text/sft.toml --dry-run
  • Config: SFTConfig (packages/prime-rl-configs/src/prime_rl/configs/sft.py)
  • Entrypoint: src/prime_rl/entrypoints/sft.py
  • SLURM: single- and multi-node

inference — vLLM server

OpenAI-compatible API plus prime-rl custom endpoints (/update_weights, /load_lora_adapter, /init_broadcaster). Always use this entrypoint — never vllm serve directly.

uv run inference @ configs/debug/infer.toml
uv run inference --model.name Qwen/Qwen3-0.6B --model.enforce-eager

Smoke checks:

curl http://<host>:<port>/health
curl http://<host>:<port>/v1/models
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "Qwen/Qwen3-0.6B", "messages": [{"role": "user", "content": "Hi"}], "max_tokens": 50}'
  • Config: InferenceConfig (packages/prime-rl-configs/src/prime_rl/configs/inference.py)
  • Entrypoint: src/prime_rl/entrypoints/inference.py
  • SLURM: single-node, multi-node, and disaggregated deployments

Summary

Command Purpose Typical use
rl Full RL pipeline Production RL training
sft Supervised fine-tuning SFT and hard-distill
inference vLLM server Standalone serving / debugging

Key paths

  • src/prime_rl/entrypoints/rl, sft, inference (+ trainer, orchestrator for direct launches)
  • packages/prime-rl-configs/src/prime_rl/configs/ — all config classes
  • configs/debug/ — minimal debug configs
  • examples/ — full example configs (e.g. reverse_text/)
Install via CLI
npx skills add https://github.com/PrimeIntellect-ai/prime-rl --skill start-run
Repository Details
star Stars 1,484
call_split Forks 313
navigation Branch main
article Path SKILL.md
More from Creator
PrimeIntellect-ai
PrimeIntellect-ai Explore all skills →