← Back to Computer Vision
cs.CV

Teaching robots to navigate by understanding themselves

Wenxuan Guo, Xiuwei Xu, Yichen Liu, Xiangyu Li, Hang Yin, Huangxing Chen, Wenzhao Zheng, Jianjiang Feng, Jie Zhou, Jiwen Lu

May 21, 2026

Vision-language navigation asks robots to follow instructions while moving through real environments. Most prior methods either use opaque end-to-end learning or rely on explicit 3D maps (which need extra sensors). AwareVLN adds explicit self-awareness—the agent reasons about where it is and how close it is to completing the task—while staying fully end-to-end. A structural reasoning module learns spatial and task awareness from data alone. On Habitat benchmarks, AwareVLN significantly outperforms existing vision-language navigation methods.
Published as AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation arXiv:2605.22816
Read the original paper →