← Back to Computer Vision cs.CV
Making robot vision robust without retraining on new data
Yiyang Fu, Chubin Zhang, Shukai Gong, Yufan Deng, Kaiwei Sun, Qiyang Min, Qibin Hou, Yansong Tang, Jianan Wang, Daquan Zhou
May 18, 2026
Vision-language-action (VLA) models—used to teach robots to act from visual input—collapse when encountering real-world corruptions like blur, fog, or noise that weren't in their training set. The authors propose Information Bottleneck Adapter, a lightweight module that filters noise from visual inputs without requiring extra data or augmentation. On long-horizon robot tasks, it recovers 30% of lost performance and lets even tiny 0.5B-parameter models match 7B-scale robots under corrupted visuals.
Read the original paper →