← Back to Computer Vision cs.CV
Why robots learn better from human video when trained on camera movement
Xingyao Lin, Guojin Zhong, Tianyi Lu, Ziyi Ye, Yichen Zhu, Zuxuan Wu, Yu-Gang Jiang
June 4, 2026
Robots trained on human video consistently underperform those trained on robot data, even though human video is more abundant. The gap stems from ignoring active perception—how humans reposition their viewpoint during manipulation. ActiveMimic recovers camera and wrist trajectories from egocentric RGB video, models viewpoint repositioning as an action, and jointly learns manipulation and active perception. Real-world robotic experiments show it matches robot-pretrained baselines while using only human video, suggesting active perception is the missing link for scaling robot learning to unconstrained human footage.
Read the original paper →