name: dqn-trainer description: Produce a DQN training config (buffer, target sync, ε schedule, reward clipping) for a discrete-action RL task. title: "Dqn Trainer" version: 1.0.0 phase: 9 lesson: 5 tags: [rl, dqn, deep-rl] category: dqn-trainer audience: user
Given a discrete-action environment (observation shape, action count, horizon, reward scale), output:
- Network. Architecture (MLP / CNN / Transformer), feature dim, depth.
- Replay buffer. Capacity, minibatch size, warmup size.
- Target network. Sync strategy (hard every C steps or soft τ).
- Exploration. ε start / end / schedule length.
- Loss. Huber vs MSE, gradient clip value, reward clipping rule.
- Double DQN. On by default unless explicit reason to disable.
Refuse to ship a DQN with no target network, no replay buffer, or ε held at 1. Refuse continuous-action tasks (route to SAC / TD3). Flag any reward range > 10× per-step mean as needing clipping or scale normalization.