← Back to Computer Vision
cs.CV

How do you reconstruct 3D scenes from endless video streams?

Chong Cheng, Peilin Tao, Nanjie Yao, Guanzhi Ding, Xianda Chen, Yuansen Du, Xiaoyang Guo, Wei Yin, Weiqiang Ren, Qian Zhang, Zhengqing Chen, Hao Wang

May 22, 2026

Online 3D reconstruction from video requires tracking camera pose and geometry in real time without growing memory or computation. HorizonStream solves a core problem: standard attention mechanisms fail on long streams because they treat all temporal evidence equally, causing drift and collapse. The method factorizes geometric evidence into long-range decay (learning channel-wise rates for multi-scale propagation) and short-range spatial matching, while special tokens stabilize pose and scale. Trained on clips of 48 frames, it handles sequences over 10,000 frames at constant memory cost with linear time complexity.
Published as HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction arXiv:2605.23889
Read the original paper →