← Back to Robotics
cs.RO

Do robot arms complete tasks safely, or just recklessly?

Jialiang Fan, Weizhe Xu, Oleg Sokolsky, Insup Lee, Fanxin Kong

May 30, 2026

SafeVLA-Bench evaluates whether robot manipulation policies actually execute safely, not just whether they reach the goal. The team added formal safety checks (Signal Temporal Logic specs) to existing benchmarks, measuring both unsafe successes and violation severity. Testing nine policies on LIBERO and RoboCasa-365 shows that high task completion masks serious problems: excessive contact, knocking over objects, and self-collision. Code and evaluation framework released.
Published as SafeVLA-Bench: A Benchmark for the Success-Safety Gap in Vision-Language-Action Models arXiv:2606.00773
Read the original paper →