← Back to Robotics
cs.RO

Can robots learn dexterous skills by aligning 3D space across cameras and bodies?

Huayi Zhou, Wei Gao, Dekun Lu, Ruiji Liu, Zhanqi Zhang, Ziyang Zhang, Jian Chen, Wenlve Zhou, Sheng Xu, Shumin Li, Kangyi Guo, Shichen Xu, Zixin Huang, Yongyi Su, Kui Jia

June 1, 2026

Robots trained on end-to-end manipulation often fail when camera angles or robot bodies change because policies learn from 2D images without spatial grounding. This work adds 3D awareness by computing pixel-wise 3D coordinates from camera calibration, then aligning both visual inputs and robot actions to a shared bird's-eye-view frame. A temporal alignment scheme also handles different recording speeds across robots and datasets. The method improves consistency and real-world transfer; code, trained models, and data pipeline are released.
Published as Dexterity-BEV: Aligning 3D World and Actions for Generalizable Robot Policies Learning arXiv:2606.02274
Read the original paper →