← Back to Artificial Intelligence cs.AI
Catching bad robot moves before they happen
Zhen Sun, Yongjian Guo, Haoran Sun, Luqiao Wang, Wei Lu, Jiachi Ji, Shengzhe Ji, Junwu Xiong, Zhijun Meng
May 21, 2026
Robot systems using vision-language models often generate poor actions that cause failures or waste computation on world-model rollouts. Pre-VLA catches these bad actions before execution by predicting safety confidence and advantage scores, using a lightweight multimodal classifier trained with techniques to handle imbalanced data. On LIBERO benchmarks, it improved success rates by 7.8 percentage points, reduced execution steps, and ran in 184 ms per action—catching errors early rather than failing during physical execution.
Read the original paper →