How to turn one blurry 3D model into crystal clarity with multiple angles

Hanxiao Sun, Mingxin Yang, Shuhui Yang, Zebin He, Xintong Han, Hongbo Fu, Chunchao Guo, Wenhan Luo

Single images produce ambiguous 3D models because half the object is invisible. ROAR-3D fixes this by upgrading pretrained single-view 3D models to accept arbitrary multi-view images without explicit camera poses. A token router assigns each 3D location to its most relevant view, while dual-stream attention keeps the original model's behavior intact while funneling geometric detail from auxiliary angles. The method adds minimal training overhead and scales from 1 to 12+ views at test time.