← Back to Computer Vision
cs.CV

Extracting hidden details to fix image reconstruction in autoencoders

Tianhang Wang, Yitong Chen, Wei Song, Zuxuan Wu, Min Li, Jiaqi Wang

May 21, 2026

Representation autoencoders using frozen vision models generate sharp images but reconstruct poorly because freezing limits spatial detail. DecQ solves this with lightweight queries that extract fine-grained information from intermediate layers, feeding it into the decoder. The result: reconstruction quality jumps from 19.13 to 22.76 dB PSNR, generative convergence accelerates 3.3×, and the model hits FID 1.41—all with just 3.9% extra computation and no fine-tuning needed.
Published as DecQ: Detail-Condensing Queries for Enhanced Reconstruction and Generation in Representation Autoencoders arXiv:2605.22777
Read the original paper →