name: mirage-fmri-mental-imagery description: "MIRAGE methodology — robust multi-modal architecture for translating fMRI-to-image models from vision decoding to mental image reconstruction. Uses linear backbone + multi-modal text/image features with diffusion model; achieves SOTA on NSD-Imagery benchmark. Activation: fMRI mental imagery, MIRAGE, brain decoding, image reconstruction, cross-decoding, NSD-Imagery" arxiv_id: "2605.17198" published: "2026-05-16" authors: "Reese Kneeland, Cesar Kadir Torrico Villanueva, Jordyn Ojeda, Shuhb Khanna, Jonathan Xu, Paul S. Scotti, Thomas Naselaris" tags: [fmri-decoding, mental-imagery, image-reconstruction, multi-modal, diffusion-model, brain-computer-interface]
MIRAGE: Robust Multi-Modal Architectures Translate fMRI-to-Image Models from Vision to Mental Imagery
MIRAGE (Multi-modal Imagery Reconstruction via Augmented Generative Encoding) is a method explicitly designed to train on vision datasets (external stimuli) and cross-decode mental images from brain activity. It uses a linear backbone with multi-modal text and image features as input to a diffusion model.
Source: arXiv: 2605.17198
Core Innovation
While modern vision decoders can reconstruct seen images from human brain activity, they fail to generalize to internally generated visual representations (mental images). MIRAGE is the first method explicitly designed to bridge this gap.
Key Technical Framework
- Linear Backbone: Maps fMRI brain activity to a shared embedding space without overfitting to specific visual stimulus
- Multi-Modal Feature Fusion: Combines text-based (semantic) and image-based (high- and low-level visual) features
- Diffusion Model: Generates reconstructed images conditioned on the fused feature representation
- Classifier-Free Guidance: Balances content preservation vs. reconstruction fidelity
Critical Findings
- SOTA on mental imagery: MIRAGE achieves state-of-the-art performance on the NSD-Imagery benchmark
- Vision decoder gap: SOTA performance on seen image reconstruction ≠ SOTA on mental image reconstruction
- Feature dimensionality matters: Mental image reconstruction works best with relatively few dimensional image features
- Multi-modal guidance essential: Both text-based and image-based features (high + low level) are required
Applications
- Clinical brain-computer interfaces: Decoding imagined content from locked-in patients
- Neuroscience research: Probing the nature of mental imagery representations
- Creativity and communication: New paradigms for expressing imagined content
Related Skills
- visual-imagery-decoding-fmri
- brain-dit-universal-multi-state
- eeg-diffusion-visual-reconstruction