add-reward-function - SKILL.md Agent Skill

name: add-reward-function description: Guide for adding a custom reward function in slime and wiring it through --custom-rm-path (and optional reward post-processing). Use when user wants new reward logic, remote/service reward integration, or task-specific reward shaping.

Implement custom reward logic and connect it to slime rollout/training safely.

Use this skill when:

Pick one of these:

slime.rollout.rm_hub.__init__.py calls your function via --custom-rm-path.

Create slime/rollout/rm_hub/<your_rm>.py.

Supported signatures:

async def custom_rm(args, sample):
    return float_reward_or_reward_dict

async def custom_rm(args, samples):
    return list_of_rewards

If using group mode, return one reward per sample in input order.

Return scalar numeric rewards unless your pipeline explicitly uses keyed rewards.
If using reward dicts, ensure downstream reward_key / eval_reward_key is configured.
Keep exceptions explicit for invalid metadata instead of silently returning zeros.

To customize normalization/shaping before advantage computation, add:

def post_process_rewards(args, samples):
    # return (raw_rewards, processed_rewards)
    ...

Wire with:

--custom-reward-post-process-path <module>.post_process_rewards

This hook is consumed in slime/ray/rollout.py.

Use:

--custom-rm-path slime.rollout.rm_hub.<your_rm>.custom_rm