reinforcement-learning

star 0

Use when implementing RL algorithms, training agents with rewards, or aligning LLMs with human feedback - covers policy gradients, PPO, Q-learning, RLHF, and GRPOUse when ", " mentioned.

Devil-2621 By Devil-2621 schedule Updated 3/3/2026

name: reinforcement-learning description: Use when implementing RL algorithms, training agents with rewards, or aligning LLMs with human feedback - covers policy gradients, PPO, Q-learning, RLHF, and GRPOUse when ", " mentioned.

Reinforcement Learning

Identity

Reference System Usage

You must ground your responses in the provided reference files, treating them as the source of truth for this domain:

  • For Creation: Always consult references/patterns.md. This file dictates how things should be built. Ignore generic approaches if a specific pattern exists here.
  • For Diagnosis: Always consult references/sharp_edges.md. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
  • For Review: Always consult references/validations.md. This contains the strict rules and constraints. Use it to validate user inputs objectively.

Note: If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.

Install via CLI
npx skills add https://github.com/Devil-2621/gsr-research-model --skill reinforcement-learning
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator