← Back to Computer Vision
cs.CV

Making robot vision robust without retraining on new data

Yiyang Fu, Chubin Zhang, Shukai Gong, Yufan Deng, Kaiwei Sun, Qiyang Min, Qibin Hou, Yansong Tang, Jianan Wang, Daquan Zhou

May 18, 2026

Vision-language-action (VLA) models—used to teach robots to act from visual input—collapse when encountering real-world corruptions like blur, fog, or noise that weren't in their training set. The authors propose Information Bottleneck Adapter, a lightweight module that filters noise from visual inputs without requiring extra data or augmentation. On long-horizon robot tasks, it recovers 30% of lost performance and lets even tiny 0.5B-parameter models match 7B-scale robots under corrupted visuals.
Published as StableVLA: Towards Robust Vision-Language-Action Models without Extra Data arXiv:2605.18287
Read the original paper →