How LiDAR helps robots navigate new scenes with only a camera

Amirhossein Zhalehmehrabi, Tiziano Tezze, Alberto Castelini, Alessandro Farinelli

Robots navigating to GPS coordinates fail when moved to new buildings or outdoor spaces because they learn appearance shortcuts specific to training scenes. This work uses LiDAR depth as a teacher during training to guide visual representations toward geometry rather than surface details, then removes the LiDAR at deployment. The frozen encoder feeds into a reinforcement learning policy, keeping representation learning separate from task learning. Testing on diverse simulated environments shows substantial improvements over large pretrained vision models, plus a new multimodal dataset and open-source code.