reinforcement

star 2

Reinforcement learning fundamentals

ffsshhttiikk By ffsshhttiikk schedule Updated 2/28/2026

name: reinforcement description: Reinforcement learning fundamentals license: MIT compatibility: opencode metadata: audience: machine-learning-engineers category: artificial-intelligence

What I do

  • Build reinforcement learning systems
  • Design reward functions and environments
  • Implement policy optimization algorithms
  • Train agents for sequential decision-making
  • Apply RL to games and robotics
  • Handle exploration-exploitation tradeoffs

When to use me

Use me when:

  • Building AI for games
  • Training robotics control systems
  • Optimizing resource allocation
  • Creating adaptive systems
  • Solving sequential decision problems

Key Concepts

RL Framework

Agent ──────▶ Action ──────▶ Environment
  │                        │
  │◀─── State + Reward ◀──│
  │                        │
  └─────── (Loop) ─────────┘

Goal: Maximize cumulative reward

OpenAI Gym Example

import gym
import numpy as np
from collections import defaultdict

# Create environment
env = gym.make("CartPole-v1")
state = env.reset()

# Q-Learning Implementation
class QLearningAgent:
    def __init__(self, n_actions, learning_rate=0.1, 
                 epsilon=0.1, gamma=0.99):
        self.q_table = defaultdict(lambda: np.zeros(n_actions))
        self.lr = learning_rate
        self.epsilon = epsilon
        self.gamma = gamma
        self.n_actions = n_actions
    
    def choose_action(self, state):
        if np.random.random() < self.epsilon:
            return env.action_space.sample()
        return np.argmax(self.q_table[state])
    
    def learn(self, state, action, reward, next_state):
        current_q = self.q_table[state][action]
        max_next_q = np.max(self.q_table[next_state])
        new_q = current_q + self.lr * (reward + 
                         self.gamma * max_next_q - current_q)
        self.q_table[state][action] = new_q

# Training loop
agent = QLearningAgent(env.action_space.n)
episodes = 1000

for episode in range(episodes):
    state = env.reset()
    done = False
    
    while not done:
        action = agent.choose_action(state)
        next_state, reward, done, _ = env.step(action)
        agent.learn(state, action, reward, next_state)
        state = next_state

Key Algorithms

  • Q-Learning: Value-based, off-policy
  • SARSA: On-policy value-based
  • DQN: Deep Q-Networks
  • A2C/A3C: Actor-Critic
  • PPO: Proximal Policy Optimization
  • DDPG: Continuous actions
  • AlphaZero: Tree search + RL

Key Concepts

  • Reward shaping: Designing reward signals
  • Exploration: Epsilon-greedy, entropy
  • Credit assignment: Delayed rewards
  • Function approximation: Deep RL
Install via CLI
npx skills add https://github.com/ffsshhttiikk/opencode-agents-skills --skill reinforcement
Repository Details
star Stars 2
call_split Forks 2
navigation Branch main
article Path SKILL.md
More from Creator
ffsshhttiikk
ffsshhttiikk Explore all skills →