rnow-cli - SKILL.md Agent Skill

name: rnow-cli description: Use the ReinforceNow CLI for RLHF training. Use when running rnow commands, initializing projects, submitting training runs, testing rollouts, or downloading models.

ReinforceNow CLI Reference

The rnow CLI manages RLHF training projects on the ReinforceNow platform.

Installation

pip install rnow

Command Overview

Command	Description
`rnow login`	Authenticate with the platform
`rnow logout`	Remove credentials
`rnow status`	Check auth and running jobs
`rnow orgs`	Manage organizations
`rnow init`	Create new project from template
`rnow run`	Submit training run
`rnow stop`	Cancel active run
`rnow test`	Test rollouts locally
`rnow download`	Download trained model

rnow login

Authenticate using OAuth device flow.

rnow login [OPTIONS]

Option	Description
`--force`	Force new login even if already authenticated
`--api-url URL`	Custom API base URL

Example:

rnow login
# Opens browser for authentication
# Stores credentials in ~/.reinforcenow/credentials.json

rnow logout

Remove stored credentials.

rnow logout

rnow status

Check authentication status and running jobs.

rnow status

Output:

Logged in as: user@example.com
Organization: My Team (org_abc123)
Active runs: 2
  - run_xyz789 (running) - Math Training
  - run_def456 (queued) - Code Agent

rnow orgs

List or select organizations.

# List all organizations
rnow orgs

# Select an organization
rnow orgs ORG_ID

Example:

rnow orgs
# Output:
# * org_abc123 - My Team (owner)
#   org_def456 - Other Team (member)

rnow orgs org_def456
# Switched to: Other Team

rnow init

Initialize a new project from a template.

rnow init [OPTIONS]

Option	Description
`--template NAME`	Template to use (see below)
`--name NAME`	Project name (prompts if not provided)

Available Templates

Template	Type	Description
`first-rl`	RL	Starter template for RL
`rl-single`	RL	Single-turn RL with rewards
`rl-tools`	RL	Multi-turn RL with tool calling
`sft`	SFT	Supervised finetuning
`tutorial-reward`	RL	Learn reward functions
`tutorial-tool`	RL	Learn tool functions
`mcp-tavily`	RL	MCP integration (web search)
`kernel`	RL	VLM browser agent with Kernel
`rl-browser`	RL	Browser agent with Playwright
`off-distill-agent`	SFT	Off-policy distillation
`on-distill-agent`	Distill	On-policy KL distillation
`posttrain`	Midtrain	Continued pretraining

Examples:

# Create SFT project
rnow init --template sft --name "my-sft-project"

# Create RL project with tools
rnow init --template rl-tools

# Create from tutorial
rnow init --template tutorial-reward

Generated Files

Template	Files
`sft`	config.yml, train.jsonl
`rl-single`	config.yml, train.jsonl, rewards.py, requirements.txt
`rl-tools`	config.yml, train.jsonl, rewards.py, tools.py, requirements.txt
`blank`	config.yml

rnow run

Submit project for training.

rnow run [OPTIONS]

Option	Description
`--dir PATH`	Project directory (default: current)
`--name NAME`	Custom run name

Required files:

config.yml - Configuration
train.jsonl - Training data
rewards.py - Reward functions (RL only)

Optional files:

tools.py - Tool definitions
requirements.txt - Python dependencies

Example:

cd my-project
rnow run

# Output:
# Validating project...
# Uploading files...
# Starting run: run_abc123xyz
# View at: https://www.reinforcenow.ai/runs/run_abc123xyz

rnow stop

Cancel an active training run.

rnow stop RUN_ID

Example:

rnow stop run_abc123xyz
# Are you sure you want to stop run_abc123xyz? [y/N]: y
# Run stopped.
# Duration: 2h 15m
# Cost: $12.50

rnow test

Test RL rollouts locally before submitting.

rnow test [OPTIONS]

Option	Default	Description
`-d, --dir PATH`	.	Project directory
`-n, --num-rollouts N`	1	Number of rollouts
`--entry INDICES`	random	Test specific entries (e.g., "0,2,5")
`--model MODEL`	gpt-5-nano	Override model for testing

Available Models

OpenAI API models (default, fast):

gpt-5-nano - Fastest, recommended for quick testing
gpt-5-mini - Faster
gpt-5.2 - Balanced
gpt-5-pro - Highest quality

GPU models (slower, uses actual training infrastructure):

Text models:

Qwen/Qwen3-8B
Qwen/Qwen3-32B
Qwen/Qwen3-30B-A3B
Qwen/Qwen3-235B-A22B-Instruct-2507
meta-llama/Llama-3.1-8B-Instruct
meta-llama/Llama-3.3-70B-Instruct
deepseek-ai/DeepSeek-V3.1

VLM models (for vision tasks with screenshots):

Qwen/Qwen3-VL-30B-A3B-Instruct - Vision-language model
Qwen/Qwen3-VL-235B-A22B-Instruct - Larger VLM

Important: Default gpt-5-nano is text-only and cannot process images. For VLM projects that return screenshots (e.g., browser agents), use --model Qwen/Qwen3-VL-30B-A3B-Instruct to test with actual vision capabilities.

Examples

Basic test:

rnow test
# Runs 1 rollout with gpt-5-nano (text-only)

Multiple rollouts:

rnow test -n 5

Test specific entries:

rnow test --entry 0,3,7
# Tests entries at indices 0, 3, and 7 from train.jsonl

Test with VLM model (for vision/screenshot projects):

rnow test --model Qwen/Qwen3-VL-30B-A3B-Instruct -n 1
# Uses actual VLM that can see images

Test with GPU model:

rnow test --model Qwen/Qwen3-8B -n 3
# Uses GPU infrastructure instead of OpenAI API

Test Output

Rollout 1/3
Entry: 0
Prompt: What is 2+2?

Turn 1:
  Assistant: The answer is 4.

Rewards:
  accuracy: 1.0
  format_check: 1.0
Total: 1.0

---
Rollout 2/3
...

rnow download

Download a trained model checkpoint.

rnow download RUN_ID [OPTIONS]

Option	Default	Description
`-o, --output DIR`	./model	Output directory

Example:

rnow download run_abc123xyz -o ./my-model
# Downloading checkpoint...
# Progress: 100%
# Saved to: ./my-model/