← Back to Computer Vision cs.CV
Can we track moving objects in 3D, not just 2D video?
Junyu Xie, Tengda Han, Weidi Xie, Andrew Zisserman
May 28, 2026
Existing moving object segmentation relies on 2D optical flow and treats motion as a static property across entire clips. GMOS grounds the problem in 3D space and time, analyzing each object's instantaneous motion per frame directly from RGB video. The team created GMOS-2K, a dataset of 2,210 annotated videos, and introduced MOS-I, a fine-grained evaluation protocol. The method achieves state-of-the-art results on multiple benchmarks while supporting real-time streaming inference.
Read the original paper →