← Back to Computer Vision cs.CV
Teaching robots to understand their own movements across environments
Bing Hu, Zaijing Li, Rui Shao, Junda Chen, April Hua Liu, Wei-Shi Zheng, Liqiang Nie
May 21, 2026
Vision-Language-Action models that control robots often fail when moved to new environments because they treat each action independently instead of understanding full behavioral sequences. BehaviorVLA fixes this by encoding entire movement trajectories into unified behavior representations, then decoding them into precise actions while tracking where the task currently stands. On three robot manipulation benchmarks, it reaches 58–98% success rates and matches OpenVLA's performance with 50% less demonstration data, suggesting the approach generalizes better across sim-to-real transfer.
Read the original paper →