Reconstructing mental images from brain scans with vision models

Reese Kneeland, Cesar Kadir Torrico Villanueva, Jordyn Ojeda, Shuhb Khanna, Jonathan Xu, Paul S. Scotti, Thomas Naselaris

Brain-to-image decoding models trained to reconstruct viewed images often fail when applied to mental imagery—internally generated visual representations. MIRAGE addresses this gap by combining a linear backbone with multi-modal text and image features fed into a diffusion model, explicitly designed to cross-decode mental images. Testing on the NSD-Imagery dataset shows MIRAGE outperforms existing vision decoders on mental image reconstruction, even those with strong performance on seen images. Ablations reveal that lower-dimensional image features plus guidance from both text and multi-level visual features work best. The finding that large-scale external stimulus datasets can effectively train mental image decoders suggests practical utility for brain imaging applications.