Understanding moving parts in video without tracking points

Existing approaches to extracting 3D motion from articulated objects (doors, robot arms, etc.) fail when videos have occlusions or unstable point tracks. This work sidesteps point tracking entirely by representing objects as simple geometric primitives (boxes, cylinders) organized into parts connected by joints. A single optimization pass recovers joint parameters and segmentation from one video, handling partial visibility automatically. Outperforms existing methods on new benchmarks with heavy motion and occlusion.