← Back to Robotics cs.RO
Can robots navigate by following instructions in photorealistic 3D worlds?
Xinhai Li, Xiaotao Zhang, Yuehao Huang, Jiankun Dong, Tianhang Wang, Sunyao Zhou, Yunzi Wu, Chengnuo Sun, Yunfei Ge, Qizhen Weng, Chi Zhang, Chenjia Bai, Xuelong Li
June 2, 2026
Robots struggle to navigate unfamiliar spaces following human instructions because training data is scarce and limited to narrow scenarios. This work combines a photorealistic 3D simulator (built on Gaussian splatting), a large curated dataset of navigation scenes, and a foundation model trained with reinforcement learning after supervised learning. The system handles instruction following, human following, and goal-directed navigation in a unified framework, with spatial reasoning encoded as bird's-eye-view maps. Results exceed prior state-of-the-art on standard benchmarks.
Read the original paper →