Separating appearance from geometry in AI view synthesis

Yihang Wu, Yihang Sun, Shaofeng Zhang, Zuxuan Wu, Junchi Yan, Xiaosong Jia, Yu-gang Jiang

Current feedforward view synthesis models blend semantic (color, texture) and spatial (ray position) information into one representation, which causes spatial structure to interfere with appearance quality. This paper separates them into distinct branches that still communicate through shared attention, adding optional branch-specific supervision and bidirectional modulation to improve their interaction. The approach works with both decoder-only and encoder-decoder architectures and adds virtually no inference cost.