quantum-memory-rl

star 1

Reinforcement learning for quantum processes with hidden memory. Agent interacts with environment maintaining hidden quantum states evolving via unknown quantum channels, using quantum instruments for sequential intervention. Proves O~(sqrt(K)) regret bound via optimistic maximum-likelihood estimation. Use when: designing RL agents for quantum control with memory, analyzing exploration-exploitation trade-offs in quantum systems, or studying thermodynamic cost of learning in quantum processes.

hiyenwong By hiyenwong schedule Updated 6/3/2026

name: quantum-memory-rl description: "Reinforcement learning for quantum processes with hidden memory. Agent interacts with environment maintaining hidden quantum states evolving via unknown quantum channels, using quantum instruments for sequential intervention. Proves O~(sqrt(K)) regret bound via optimistic maximum-likelihood estimation. Use when: designing RL agents for quantum control with memory, analyzing exploration-exploitation trade-offs in quantum systems, or studying thermodynamic cost of learning in quantum processes."

Quantum Memory Reinforcement Learning

Description

Formalizes RL for quantum systems with hidden memory — environment maintains quantum states evolving via unknown channels, agent intervenes with quantum instruments. Connects learning regret to thermodynamic dissipation.

Core Framework

Problem Setup

  • Hidden Memory: Environment maintains quantum state ρ evolving via unknown quantum channel ℰ
  • Agent Actions: Quantum instruments (generalized measurements with classical outcomes)
  • Feedback: Partial, probabilistic measurement outcomes
  • Goal: Maximize cumulative reward over K episodes

Algorithm: Optimistic Maximum-Likelihood Estimation

1. Maintain set of plausible quantum channels (confidence set)
2. Select optimistic channel (maximizes reward potential)
3. Execute optimal quantum instrument for optimistic model
4. Update confidence set with observed outcomes
5. Repeat

Regret Analysis

  • Upper bound: Õ(√K) cumulative regret over K episodes
  • Lower bound: Matches via reduction to multi-armed quantum bandit
  • Optimality: Sublinear scaling is information-theoretically optimal

Continuous Action Space Extension

  • General POVMs (Positive Operator-Valued Measures) as actions
  • Discretization + optimism preserves regret bound
  • Handles continuous measurement protocols

Physical Application: State-Agnostic Work Extraction

Thermodynamic Interpretation

  • Learning regret = thermodynamic dissipation
  • Unknown quantum memory → energy extraction loss
  • Sublinear regret → asymptotically zero dissipation rate
  • Agent improves extraction protocol adaptively using past outcomes

Key Insight

The mathematical regret in RL exactly quantifies cumulative thermodynamic dissipation when extracting free energy from non-i.i.d. quantum states with hidden correlations.

Usage Patterns

Pattern 1: Quantum Control with Memory

For systems where past states influence current dynamics:

  1. Model environment as hidden quantum channel + memory
  2. Use optimistic MLE for channel estimation
  3. Design instruments that balance exploration/exploitation
  4. Prove regret bounds via error propagation analysis

Pattern 2: Adaptive Quantum Thermodynamics

For work extraction from correlated quantum sources:

  1. Map thermodynamic task to RL framework
  2. Design extraction protocol as quantum instrument
  3. Use learning algorithm to minimize dissipation
  4. Verify asymptotic zero-dissipation guarantee

Pattern 3: Quantum System Identification

For learning unknown quantum dynamics:

  1. Sequential intervention with information-gathering instruments
  2. MLE-based channel estimation with confidence sets
  3. Active learning: choose interventions that maximize information gain

Key Mathematical Tools

  • Quantum instruments: Generalized measurements with classical+quantum outputs
  • Quantum channels: Completely positive trace-preserving (CPTP) maps
  • POVMs: General measurement operators for continuous actions
  • Optimistic MLE: Upper confidence bound approach for quantum models
  • Regret decomposition: Error propagation through quantum channel composition

Activation Keywords

  • quantum reinforcement learning memory
  • quantum bandit with memory
  • quantum system identification RL
  • optimistic MLE quantum channels
  • quantum thermodynamic regret
  • work extraction quantum learning
  • 量子强化学习记忆
  • quantum instrument RL
  • POVM reinforcement learning
  • quantum channel learning regret
Install via CLI
npx skills add https://github.com/hiyenwong/ai_collection --skill quantum-memory-rl
Repository Details
star Stars 1
call_split Forks 0
navigation Branch main
article Path SKILL.md
Occupations
More from Creator