Extracting hidden details to fix image reconstruction in autoencoders

Representation autoencoders using frozen vision models generate sharp images but reconstruct poorly because freezing limits spatial detail. DecQ solves this with lightweight queries that extract fine-grained information from intermediate layers, feeding it into the decoder. The result: reconstruction quality jumps from 19.13 to 22.76 dB PSNR, generative convergence accelerates 3.3×, and the model hits FID 1.41—all with just 3.9% extra computation and no fine-tuning needed.