← Back to Computer Vision
cs.CV

Understanding moving parts in video without tracking points

Arslan Artykov, Tom Ravaud, Nicolás Violante-Grezzi, Vincent Lepetit

May 18, 2026

Existing approaches to extracting 3D motion from articulated objects (doors, robot arms, etc.) fail when videos have occlusions or unstable point tracks. This work sidesteps point tracking entirely by representing objects as simple geometric primitives (boxes, cylinders) organized into parts connected by joints. A single optimization pass recovers joint parameters and segmentation from one video, handling partial visibility automatically. Outperforms existing methods on new benchmarks with heavy motion and occlusion.
Published as Articulation in Prime: Primitive-Based Articulated Object Understanding from a Single Casual Video arXiv:2605.18645
Read the original paper →