← Back to Robotics
cs.RO

Implicit geometry transformer reconstructs 3D scenes from unposed images

Yuqi Wu, Tianyu Hu, Wenzhao Zheng, Yuanhui Huang, Haowen Sun, Jie Zhou, Jiwen Lu

May 15, 2026

Existing methods for 3D reconstruction from multi-view images rely on explicit geometry predictions that are often redundant and discontinuous. IVGT models geometry implicitly using signed distance functions in a canonical coordinate system, querying any 3D position to predict geometry and appearance. Trained with 2D supervision and 3D geometric regularization across multiple datasets, the model generalizes to unseen scenes and supports diverse downstream tasks: mesh and point cloud reconstruction, novel view synthesis, depth/normal estimation, and camera pose estimation. A single model handles all tasks without task-specific training.
Published as IVGT: Implicit Visual Geometry Transformer for Neural Scene Representation arXiv:2605.16258
Read the original paper →