name: eeg-staged-representation-learning description: "Neuroscience-inspired staged representation learning framework for EEG visual decoding. Organizes EEG representation learning into three complementary phases: low-level visual, high-level semantic, and integrative fusion, with disentangled coarse/fine-grained semantics." source: arXiv 2605.16923 tags: - eeg - visual-decoding - representation-learning - brain-computer-interface - staged-learning - neuro-inspired authors: Xiang Gao, Hui Tian, Yanming Zhu, Xuefei Yin, Alan Wee-Chung Liew published: 2026-05-21
Neuroscience-inspired Staged Representation Learning for EEG Visual Decoding
Overview
Proposes a neuroscience-inspired staged representation learning framework that reformulates EEG visual decoding as a stage-specific representation decomposition problem. Instead of learning a single global EEG embedding for cross-modal alignment, this framework explicitly models the staged and hierarchical characteristics of human visual processing.
Core Innovation
- EEG visual decoding → decompose into three complementary representation stages
- Introduces multimodal dual-level semantic learning: separates coarse label-level semantics from fine image-level visual-semantic information
- Semantic latent channels: computational representation channels generated from observed visual EEG signals, expanding channel-level semantic representation space
Framework Architecture
Stage 1: Low-level Visual Representation Learning
- Extracts basic visual features from EEG signals (edges, textures, shapes)
- Corresponds to early visual cortex (V1-V2) processing
- Uses convolutional encoders to capture spatiotemporal patterns
Stage 2: High-level Semantic Representation Learning
- Extracts abstract semantic concepts from EEG
- Corresponds to higher visual cortex (IT, PFC) processing
- Leverages dual-level semantic learning:
- Coarse label-level: Category-level semantics (e.g., "face", "animal")
- Fine image-level: Instance-specific visual-semantic information
Stage 3: Integrative Information Fusion
- Fuses low-level visual and high-level semantic representations
- Produces unified EEG embedding for cross-modal alignment
- Uses cross-attention mechanisms for integration
Semantic Latent Channels
- Generated from observed visual EEG signals
- Expand the channel-level semantic representation space
- Enable structured semantic abstraction and cross-modal alignment
- Different from standard EEG channels — they are learned computational channels
Technical Details
Multimodal Dual-Level Semantic Learning
- Coarse semantics: Aligns EEG embedding with class-level semantic labels (e.g., WordNet categories)
- Fine semantics: Aligns EEG embedding with image-level visual features from vision models (e.g., CLIP)
- Both levels are trained simultaneously with contrastive objectives
Benchmark Performance (THINGS-EEG)
- Subject-dependent zero-shot: Superior performance achieved
- Subject-independent zero-shot: Improved exact retrieval
- Comprehensive ablations validate staged decomposition approach
Analysis
- Layer-wise retrieval: Deeper stages capture more semantic information
- Temporal accumulation: Later temporal windows contribute more to semantic decoding
- Expanded multi-image retrieval: Framework scales with additional images
Key Insights
- Hierarchical processing matters: Explicitly modeling staged perception → semantic → integrative representations outperforms monolithic embedding approaches
- Disentanglement helps: Separating coarse and fine semantics improves both classification and retrieval
- Neuro-inspired design: The staged framework mirrors the ventral visual stream's hierarchical organization (V1 → V2 → V4 → IT)
Applications
- EEG-based visual decoding: Zero-shot classification and retrieval
- BCI communication: More accurate visual prosthetics
- Cognitive neuroscience: Probing hierarchical visual processing through EEG
- Medical rehabilitation: Visual assessment for locked-in patients
Related Skills
- eeg-visual-attention-decoding
- eeg-structure-guided-diffusion
- meta-learning-in-context-brain-decoding
- eeg2vision-multimodal-eeg-framework-2d-visual
Activation
staged eeg representation, EEG visual decoding, coarse-to-fine semantics, semantic latent channels, neuro-inspired EEG, staged representation learning, THINGS-EEG benchmark, EEG cross-modal alignment