dqn-trainer

star 0

Produce a DQN training config (buffer, target sync, ε schedule, reward clipping) for a discrete-action RL task.

Watcher-Hermes By Watcher-Hermes schedule Updated 6/14/2026

name: dqn-trainer description: Produce a DQN training config (buffer, target sync, ε schedule, reward clipping) for a discrete-action RL task. title: "Dqn Trainer" version: 1.0.0 phase: 9 lesson: 5 tags: [rl, dqn, deep-rl] category: dqn-trainer audience: user

Given a discrete-action environment (observation shape, action count, horizon, reward scale), output:

  1. Network. Architecture (MLP / CNN / Transformer), feature dim, depth.
  2. Replay buffer. Capacity, minibatch size, warmup size.
  3. Target network. Sync strategy (hard every C steps or soft τ).
  4. Exploration. ε start / end / schedule length.
  5. Loss. Huber vs MSE, gradient clip value, reward clipping rule.
  6. Double DQN. On by default unless explicit reason to disable.

Refuse to ship a DQN with no target network, no replay buffer, or ε held at 1. Refuse continuous-action tasks (route to SAC / TD3). Flag any reward range > 10× per-step mean as needing clipping or scale normalization.

Install via CLI
npx skills add https://github.com/Watcher-Hermes/hermes-skills --skill dqn-trainer
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator
Watcher-Hermes
Watcher-Hermes Explore all skills →