name: lsformer-local-structure-aware-spiking-transformer description: > LSFormer: Local Structure-Aware Spiking Transformer. Replaces global self-attention with dilated local windows and spiking response pooling for energy-efficient SNNs. Keywords: spiking transformer, local attention, SNN, energy-efficient, LSFormer, spiking neural network, self-attention bottleneck, SPooling, LS-SSA, Tiny-ImageNet
LSFormer — Local Structure-Aware Spiking Transformer
Paper: Breaking Global Self-Attention Bottlenecks in Transformer-based Spiking Neural Networks with Local Structure-Aware Self-Attention Authors: Lingdong Li, Hangming Zhang, Qiang Yu arXiv: 2605.13887 (cs.NE / cs.AI, 2026-05-12)
Problem
Transformer-based SNNs suffer from two limitations:
- Max pooling bottleneck — only captures the strongest spike response, failing to preserve representative regional features across time steps.
- Global self-attention bottleneck — quadratic computational complexity conflicts with the sparse, energy-efficient nature of SNNs.
Methodology
SPooling (Spiking Response Pooling)
- Replaces max pooling with a spiking-aware aggregation mechanism.
- Captures cumulative regional spike responses across time, not just the peak.
- Preserves richer temporal-spatial feature representations for downstream attention.
LS-SSA (Local Structure-Aware Spiking Self-Attention)
- Introduces a local dilated window mechanism to limit attention scope.
- Balances local detail capture with long-range dependency modeling.
- Reduces computational complexity from O(N²) to near-linear while maintaining accuracy.
- Maintains spike-compatible operations throughout.
Architecture
Input Spike Tensor → SPooling → LS-SSA Blocks → Classification Head
- LS-SSA blocks replace standard global self-attention in Transformer layers.
- Dilated windows expand receptive field without full quadratic attention.
- Compatible with standard SNN training pipelines (direct training or ANN-SNN conversion).
Results
| Dataset | Improvement |
|---|---|
| Tiny-ImageNet | +4.3% top-1 accuracy (SOTA) |
| N-CALTECH101 | +8.6% accuracy (SOTA) |
Usage Guidance
Use this skill when:
- Designing or optimizing spiking transformer architectures for vision tasks.
- Seeking energy-efficient alternatives to global self-attention in SNNs.
- Implementing local attention mechanisms with dilated receptive fields.
- Replacing max pooling with temporal-spike-aware pooling in SNN pipelines.
- Benchmarking SNNs on image classification (Tiny-ImageNet, event-based datasets).
Key Implementation Notes
- SPooling aggregates spike responses across the temporal dimension before attention.
- LS-SSA window dilation factor controls the trade-off between locality and global context.
- Maintain spike-compatible activation (threshold-and-fire) within attention computations.
- Compatible with both rate-coded and temporal-coded SNN representations.
References
- arXiv: 2605.13887
- Related: Spiking Neural Networks, Vision Transformers, Efficient Attention