configs

name: configs description: How the prime-rl config system works — TOML files, CLI overrides, composition, and special patterns. Use when creating configs, debugging config errors, or overriding values via CLI.

prime-rl uses pydantic-config — a Pydantic-based TOML + CLI config system (no tyro). Every entrypoint accepts TOML files via @ and CLI overrides.

Loading and composition

uv run rl @ examples/reverse_text/rl.toml                                  # single TOML
uv run rl @ examples/reverse_text/rl.toml --max-steps 50                   # CLI override
uv run rl @ base.toml @ overlay.toml                                       # left-to-right merge
uv run rl --model @ model.toml --data @ data.toml                          # nested section files
uv run rl @ base.toml --trainer @ trainer.toml --trainer.lr 1e-3           # mixed

Resolution order: CLI > config files (left-to-right) > class defaults. Merging is deep — unset fields in an overlay are preserved from the base.

Naming: CLI uses kebab-case (--model.max-model-len); TOML uses snake_case (max_model_len).

Inspect & validate

uv run rl --help                                  # all fields and defaults
uv run rl @ rl.toml --dry-run --output-dir /tmp/x # write resolved TOML to /tmp/x/configs

Validators

Incompatible combinations (e.g. CP requires flash attention) must raise in a model_validator at resolve time, not at runtime. When renaming a field, emit a deprecation warning with a migration hint — never silently drop.

Special syntax

Booleans — CLI --flag / --no-flag; TOML must be explicit (enforce_eager = true).

None — TOML has no null, use the string "None" (max_model_len = "None"); CLI: --model.max-model-len None.

Lists — TOML uses array of tables; later config files replace lists wholesale, so overlays must include the full desired list:

[[orchestrator.env]]
id = "reverse-text"

CLI: --env.0.id reverse-text --env.1.id math-env.

Dicts — TOML uses a section; CLI takes a JSON string: --vllm-extra '{"key1": "value1"}'.

Discriminated unions — set the type field to pick the variant ([trainer.loss] type = "sft"). Omit type to keep the default variant.

BaseModel | None fields — bare flag enables defaults; nested override enables and sets:

--model.compile             # enables compile with defaults
--model.compile.fullgraph   # enables and sets fullgraph=true

In TOML, an empty section header ([ckpt]) does the same.

RL trainer token exports

For rollout debugging, enable trainer-side token export with trainer.enable_token_export = true (or --enable-token-export when running the trainer entrypoint directly). It writes one JSONL record per exported sequence. Single-run/fallback exports go under output_dir/token_exports/step_<step>/rank_<rank>.jsonl; multi-run trainer exports with packer metadata go under the owning run directory, output_dir/<run_id>/token_exports/step_<run_step>/rank_<rank>.jsonl. Each record stores aligned per-token arrays for token ids, loss mask, advantage, reward, entropy, mismatch KL, inference/trainer logprobs, importance ratios, probability deltas, and masking diagnostics. It does not decode token text in the trainer.

enable_token_export = true

Leave it unset for normal training. When enabled, it exports every sequence from each exporting rank.

Key files

packages/prime-rl-configs/src/prime_rl/ — config classes under configs/; utils/config.py re-exports BaseConfig and cli
configs/debug/ — minimal debug configs
examples/ — full example configs