← Back to Robotics cs.RO
How to speed up 3D reconstruction by picking the right image tokens?
Shuhong Zheng, Michael Oechsle, Erik Sandström, Marie-Julie Rakotosaona, Federico Tombari, Igor Gilitschenski
May 22, 2026
Visual geometry transformers reconstruct 3D scenes from multiple images but slow down as sequence length grows due to global attention. The authors propose a two-stage token selection: first, identify which image frames matter most using diversity-based scoring; second, prune redundant tokens within those frames based on attention entropy patterns. On 500-image scenes, this delivers 85% speedup without sacrificing accuracy—sometimes improving it—making multi-view 3D reconstruction practical for larger datasets.
Read the original paper →