← Back to Computer Vision
cs.CV

How to reconstruct 3D scenes without losing track over time?

Congrong Xu, Huachen Gao, Xingyu Chen, Yuliang Xiu, Jun Gao, Anpei Chen

May 26, 2026

Existing depth-and-pose foundation models assume a fixed global coordinate frame, which breaks down for long videos or streaming input—positions drift unbounded over time. R³ instead predicts relative constraints between frames using a lightweight MLP, with confidence scores that weight both training losses and pose aggregation. This lets the model handle arbitrarily long sequences without memory growth, working in both full-context and causal streaming modes.
Published as $R^3$: 3D Reconstruction via Relative Regression arXiv:2605.26519
Read the original paper →