Teaching robots complex moves without filming them in real life?

Tianyi Xie, Haotian Zhang, Jinhyung Park, Zi Wang, Bowen Wen, Jiefeng Li, Xueting Li, Qingwei Ben, Haoyang Weng, Yufei Ye, David Minor, Tingwu Wang, Chenfanfu Jiang, Sanja Fidler, Jan Kautz, Linxi Fan, Yuke Zhu, Zhengyi Luo, Umar Iqbal, Ye Yuan

Training humanoid robots to walk, manipulate objects, and navigate terrain usually requires expensive motion capture or teleoperation. GRAIL sidesteps this by generating 20,000+ training sequences entirely in simulation: it composes 3D models, uses video foundation models to synthesize realistic human motions, then reconstructs metric 4D trajectories of humans interacting with objects. The pipeline retrains these motions for a robot, trains visual policies end-to-end, and deploys on a Unitree G1 with strong real-world performance on pick-up and stair-climbing tasks.