← Back to Computer Vision cs.CV
Teaching robots to understand pointing gestures
Wenxuan Guo, Ziyuan Li, Meng Zhang, Yichen Liu, Yimeng Dong, Chuxi Xu, Yunfei Wei, Ze Chen, Erjin Zhou, Jianjiang Feng
May 21, 2026
Robot manipulation systems typically rely on text commands, but pointing at objects is faster and clearer for humans. GesVLA adds gesture as a native instruction modality alongside language, encoding hand position directly into the robot's decision-making. The team generated synthetic training data by rendering hand models onto real scenes, then trained the system to both recognize gestures and predict actions. On real robot tasks like picking produce or products, gesture guidance boosted accuracy in complex scenes where multiple similar objects create confusion.
Read the original paper →