lsformer-local-structure-aware-spiking-transformer - SKILL.md Agent Skill

name: lsformer-local-structure-aware-spiking-transformer description: > LSFormer: Local Structure-Aware Spiking Transformer. Replaces global self-attention with dilated local windows and spiking response pooling for energy-efficient SNNs. Keywords: spiking transformer, local attention, SNN, energy-efficient, LSFormer, spiking neural network, self-attention bottleneck, SPooling, LS-SSA, Tiny-ImageNet

LSFormer — Local Structure-Aware Spiking Transformer

Paper: Breaking Global Self-Attention Bottlenecks in Transformer-based Spiking Neural Networks with Local Structure-Aware Self-Attention Authors: Lingdong Li, Hangming Zhang, Qiang Yu arXiv: 2605.13887 (cs.NE / cs.AI, 2026-05-12)

Problem

Transformer-based SNNs suffer from two limitations:

Max pooling bottleneck — only captures the strongest spike response, failing to preserve representative regional features across time steps.
Global self-attention bottleneck — quadratic computational complexity conflicts with the sparse, energy-efficient nature of SNNs.

Methodology

SPooling (Spiking Response Pooling)

Replaces max pooling with a spiking-aware aggregation mechanism.
Captures cumulative regional spike responses across time, not just the peak.
Preserves richer temporal-spatial feature representations for downstream attention.

LS-SSA (Local Structure-Aware Spiking Self-Attention)

Introduces a local dilated window mechanism to limit attention scope.
Balances local detail capture with long-range dependency modeling.
Reduces computational complexity from O(N²) to near-linear while maintaining accuracy.
Maintains spike-compatible operations throughout.

Architecture

Input Spike Tensor → SPooling → LS-SSA Blocks → Classification Head

LS-SSA blocks replace standard global self-attention in Transformer layers.
Dilated windows expand receptive field without full quadratic attention.
Compatible with standard SNN training pipelines (direct training or ANN-SNN conversion).

Results

Dataset	Improvement
Tiny-ImageNet	+4.3% top-1 accuracy (SOTA)
N-CALTECH101	+8.6% accuracy (SOTA)

Usage Guidance

Use this skill when:

Designing or optimizing spiking transformer architectures for vision tasks.
Seeking energy-efficient alternatives to global self-attention in SNNs.
Implementing local attention mechanisms with dilated receptive fields.
Replacing max pooling with temporal-spike-aware pooling in SNN pipelines.
Benchmarking SNNs on image classification (Tiny-ImageNet, event-based datasets).

Key Implementation Notes

SPooling aggregates spike responses across the temporal dimension before attention.
LS-SSA window dilation factor controls the trade-off between locality and global context.
Maintain spike-compatible activation (threshold-and-fire) within attention computations.
Compatible with both rate-coded and temporal-coded SNN representations.

References

arXiv: 2605.13887
Related: Spiking Neural Networks, Vision Transformers, Efficient Attention