← Back to Robotics cs.RO
Do robots learn better by breaking tasks into sub-skills?
Anya Singh, Cabrel Happi, Jai Relan, Varun Nair, Vidyut Baradwaj
May 29, 2026
Vision-language-action policies struggle to learn new tasks without expensive fine-tuning. Researchers trained two VLA architectures on assembly data using either raw trajectories or primitive-segmented episodes (broken into sub-skills), then tested few-shot transfer on held-out tasks using only 0–10 demonstrations. Primitive-trained models hit 78% of fine-tuned performance with 3 demos; flat-trained models needed 10. Ablating the primitive-decodable subspace of hidden states dropped transfer by 32 points, proving primitives are causally necessary, not coincidental.
Read the original paper →