← Back to Robotics cs.RO
Can robots reason in 3D space without seeing it?
Jiaxin Shi, Xidong Zhang, Fucai Zhu, Zhe Li, Siyu Zhu, Weihao Yuan
June 3, 2026
Robot control models struggle with spatial reasoning from images alone. This work injects 3D geometric awareness into vision-language-action models during training by having them learn from a 3D foundation model teacher, then distills that spatial knowledge into lightweight adapters. At test time, the robot uses only 2D images—no 3D sensors or teacher needed—yet outperforms prior approaches on LIBERO, LIBERO-PLUS, SimplerEnv benchmarks and real manipulation tasks.
Read the original paper →