name: add-reward-function description: Guide for adding a custom reward function in slime and wiring it through --custom-rm-path (and optional reward post-processing). Use when user wants new reward logic, remote/service reward integration, or task-specific reward shaping.
Add Reward Function
Implement custom reward logic and connect it to slime rollout/training safely.
When to Use
Use this skill when:
- User asks to add new reward computation logic
- User asks to integrate an external reward service
- User asks to customize reward normalization/post-processing
Step-by-Step Guide
Step 1: Choose Reward Mode
Pick one of these:
- Single-sample mode (
--group-rmdisabled): custom function gets oneSample - Group/batch mode (
--group-rmenabled): custom function getslist[Sample]
slime.rollout.rm_hub.__init__.py calls your function via --custom-rm-path.
Step 2: Create Reward Module
Create slime/rollout/rm_hub/<your_rm>.py.
Supported signatures:
async def custom_rm(args, sample):
return float_reward_or_reward_dict
async def custom_rm(args, samples):
return list_of_rewards
If using group mode, return one reward per sample in input order.
Step 3: Keep Reward Type Consistent
- Return scalar numeric rewards unless your pipeline explicitly uses keyed rewards.
- If using reward dicts, ensure downstream
reward_key/eval_reward_keyis configured. - Keep exceptions explicit for invalid metadata instead of silently returning zeros.
Step 4: Optional Reward Post-Processing
To customize normalization/shaping before advantage computation, add:
def post_process_rewards(args, samples):
# return (raw_rewards, processed_rewards)
...
Wire with:
--custom-reward-post-process-path <module>.post_process_rewards
This hook is consumed in slime/ray/rollout.py.
Step 5: Wire and Validate
Use:
--custom-rm-path slime.rollout.rm_hub.<your_rm>.custom_rm
Common Mistakes
- Returning wrong output shape in group mode
- Mixing scalar rewards and reward dicts without
reward_keyconfig - Doing blocking network calls without async handling
- Forgetting to validate reward behavior on truncated/failed samples
Reference Locations
- Reward dispatch:
slime/rollout/rm_hub/__init__.py - Reward post-process hook:
slime/ray/rollout.py - Customization docs:
docs/en/get_started/customization.md