← Back to Robotics
cs.RO

Teaching robots to understand what objects are for and how to use them

Zhaoning Wang, Yi Zhong, Jiawei Fu, Henrik I. Christensen, Jun Gao

June 1, 2026

Robots need to understand not just where to grasp an object, but how to manipulate itβ€”a challenge called affordance understanding. This work builds AFUN, a foundation model that takes an RGB-D image and task description, then outputs both a functional mask (where to interact) and a 3D motion curve (how to interact). By combining heterogeneous data from robots, humans, simulations, and real scans into one schema, it generalizes across environments and objects without task-specific tuning, and beats existing methods by large margins on affordance segmentation and contact prediction.
Published as AFUN: Towards an Affordance Foundation Model for Functionality Understanding arXiv:2606.02551
Read the original paper β†’