← Back to Robotics
cs.RO

Can robots learn to follow detailed execution instructions, not just goals?

Xintong Hu, Xuhong Huang, Jinyu Zhang, Yutong Yao, Yuchong Sun, Qiuyue Wang, Mingsheng Li, Sicheng Xie, Yitao Liu, Junhao Chen, Yixuan Chen, Yingming Zheng, Shuai Bai, Tao Yu

May 26, 2026

Robot policies trained on coarse "pick up the cup" commands miss critical execution details like approach direction and contact point. FineVLA adds fine-grained annotations (47K verified trajectories from 972K robot videos) and shows that mixing detailed instructions with goal-level commands—peaked at 1:1 ratio—yields 86.8% success in simulation and 62.7/100 in real dual-arm tasks. The approach improves steerable control (+23 points for pose adjustments) while maintaining baseline performance.
Published as FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies arXiv:2605.27284
Read the original paper →