Teaching robots to understand their own movements across environments

Bing Hu, Zaijing Li, Rui Shao, Junda Chen, April Hua Liu, Wei-Shi Zheng, Liqiang Nie

Vision-Language-Action models that control robots often fail when moved to new environments because they treat each action independently instead of understanding full behavioral sequences. BehaviorVLA fixes this by encoding entire movement trajectories into unified behavior representations, then decoding them into precise actions while tracking where the task currently stands. On three robot manipulation benchmarks, it reaches 58–98% success rates and matches OpenVLA's performance with 50% less demonstration data, suggesting the approach generalizes better across sim-to-real transfer.