← Back to Computer Vision cs.CV
Teaching robots to navigate by understanding themselves
Wenxuan Guo, Xiuwei Xu, Yichen Liu, Xiangyu Li, Hang Yin, Huangxing Chen, Wenzhao Zheng, Jianjiang Feng, Jie Zhou, Jiwen Lu
May 21, 2026
Vision-language navigation asks robots to follow instructions while moving through real environments. Most prior methods either use opaque end-to-end learning or rely on explicit 3D maps (which need extra sensors). AwareVLN adds explicit self-awareness—the agent reasons about where it is and how close it is to completing the task—while staying fully end-to-end. A structural reasoning module learns spatial and task awareness from data alone. On Habitat benchmarks, AwareVLN significantly outperforms existing vision-language navigation methods.
Read the original paper →