← Back to Computer Vision
cs.CV

Can we track moving objects in 3D, not just 2D video?

Junyu Xie, Tengda Han, Weidi Xie, Andrew Zisserman

May 28, 2026

Existing moving object segmentation relies on 2D optical flow and treats motion as a static property across entire clips. GMOS grounds the problem in 3D space and time, analyzing each object's instantaneous motion per frame directly from RGB video. The team created GMOS-2K, a dataset of 2,210 annotated videos, and introduced MOS-I, a fine-grained evaluation protocol. The method achieves state-of-the-art results on multiple benchmarks while supporting real-time streaming inference.
Published as GMOS: Grounding Moving Object Segmentation in 3D Space and Time arXiv:2605.30352
Read the original paper →