scalable-on-hardware-qnn-training - SKILL.md Agent Skill

name: scalable-on-hardware-qnn-training description: Scalable on-hardware QNN training methodology for clinical data. Butterfly circuit architecture with O(n log n) parameters, layer-wise training strategy, and parallelised parameter-shift rule reducing gradient estimation from O(n^2) to O(log n). Validated on MIMIC-III clinical data on IonQ hardware at 16-32 qubits. arXiv: 2606.03517 category: quantum-medical authors: Natansh Mathur, Panagiotis Kl. Barkoutsos, Masako Yamada, Martin Roetteler, Iordanis Kerenidis source: "arxiv:2606.03517" created: "2026-06-10" version: "1.0.0" tags: ["quantum-neural-network", "clinical-data", "hardware-training", "butterfly-circuit", "gradient-estimation"]

Scalable On-Hardware QNN Training

Overview

Methodology for training Quantum Neural Networks (QNNs) directly on quantum hardware at scale, validated on clinical data imputation using the MIMIC-III electronic health record dataset.

Paper: Scalable On-Hardware Training of Quantum Neural Networks and Application to Clinical Data Imputation
arXiv: 2606.03517
Authors: Mathur, Barkoutsos, Yamada, Roetteler, Kerenidis

Core Problem

Training QNNs on quantum hardware is bottlenecked by gradient estimation cost: standard parameter-shift methods require O(n^2) circuit evaluations (quadratic in trainable parameters), making hardware-based optimization impractical beyond small systems.

Solution Architecture

Three Co-Designed Components

Butterfly Circuit Architecture
- Structured, subspace-preserving circuit with O(n log n) parameters
- Logarithmic circuit depth
- Exploits commuting structure within layers
Layer-Wise Training Strategy
- Confines on-hardware optimization to one small, well-structured layer at a time
- Avoids global parameter optimization instability
- Enables scaling to larger qubit counts
Parallelised Parameter-Shift Rule
- Exploits commuting structure within each Butterfly layer
- Extracts all gradients in a constant number of circuit executions
- Reduces distinct circuit evaluations per step from O(n^2) to O(log n)

Clinical Application: MIMIC-III Data Imputation

Dataset

MIMIC-III: Electronic health record dataset
Task: Clinical data imputation for downstream patient survival prediction
Sensitive to optimization instability and model variance

Results

Trained directly on IonQ Forte Enterprise trapped-ion hardware at 16 qubits
No performance degradation vs ideal/noisy simulation
Tensor-network simulation validated at 32 qubits
32-qubit inference executed on hardware
Match or exceed strong classical neural baselines in patient survival prediction
Reduced variance across training runs

Implementation Guide

Step 1: Design Butterfly Circuit

# Pseudocode for Butterfly layer construction
def butterfly_layer(n_qubits):
    """Construct a subspace-preserving Butterfly circuit layer.
    O(n log n) parameters, logarithmic depth.
    Exploits commuting structure for parallel gradient extraction."""
    # Layer consists of commuting 2-qubit gates arranged in butterfly pattern
    # Each gate parameterized with rotation angles
    pass

Step 2: Layer-Wise Training Loop

# Pseudocode for layer-wise training
def train_layerwise(qnn, data, n_layers):
    """Train QNN one layer at a time on hardware."""
    for layer_idx in range(n_layers):
        # Freeze all layers except current
        qnn.freeze_except(layer_idx)
        
        # Optimize current layer on hardware
        gradients = parallel_parameter_shift(qnn, layer_idx)
        qnn.update_layer(layer_idx, gradients)
        
        # Evaluate on validation set
        metrics = evaluate(qnn, val_data)

Step 3: Parallel Parameter-Shift

# Pseudocode for parallel gradient extraction
def parallel_parameter_shift(qnn, layer_idx):
    """Extract all gradients for a layer in constant circuit executions.
    Exploits commuting structure of Butterfly layer gates."""
    # Group commuting gates
    commutation_groups = find_commuting_groups(qnn, layer_idx)
    
    # Each group can be measured in parallel
    all_gradients = []
    for group in commutation_groups:
        results = measure_in_parallel(qnn, group)
        gradients = compute_shift_gradients(results)
        all_gradients.extend(gradients)
    
    return all_gradients

Scalability

Qubit Count	Training Method	Hardware
16	Direct hardware training	IonQ Forte Enterprise
32	Tensor-network simulation + hardware inference	IonQ Forte Enterprise

When to Use

Designing quantum neural networks for clinical/healthcare data
Need to train QNNs directly on NISQ hardware
Facing gradient estimation bottlenecks in parameter-shift methods
Building hybrid classical-quantum models for EHR/imputation tasks
Need scalable QNN training beyond 10-15 qubits

Key Insights

Structured circuits matter: Butterfly architecture provides both expressivity and trainability
Layer-wise > global: Training layers sequentially avoids barren plateaus
Commuting = parallel: Commuting gate structure enables constant-cost gradient extraction
Clinical data is demanding: MIMIC-III is sensitive to optimization instability - good benchmark

Activation Keywords

scalable qnn training, butterfly circuit, layer-wise training, clinical data imputation, MIMIC-III, parallel parameter-shift, hardware optimization, IonQ, trapped-ion, gradient estimation, hybrid quantum-classical, electronic health records, patient survival prediction, O(log n) gradient, subspace-preserving circuit