Robot learns to remember its intentions during manipulation tasks

Shijie Lian, Bin Yu, Xiaopeng Lin, Zhaolong Shen, Laurence Tianruo Yang, Yurun Jin, Haishan Liu, Changti Wu, Hang Yuan, Cong Huang, Kai Chen

Robot imitation learning suffers when similar observations can justify different actions depending on context or task phase—a problem called aliasing. IntentVLA addresses this by conditioning action generation on a compact intent representation built from recent visual observations, rather than treating each planning step independently. The authors also introduce AliasBench, a 12-task benchmark designed to isolate and measure this ambiguity problem. Experiments on AliasBench, SimplerEnv, LIBERO, and RoboCasa show improved rollout stability and performance over frame-conditioned VLA baselines.