← Back to Computer Vision
cs.CV

Why robots learn better from human video when trained on camera movement

Xingyao Lin, Guojin Zhong, Tianyi Lu, Ziyi Ye, Yichen Zhu, Zuxuan Wu, Yu-Gang Jiang

June 4, 2026

Robots trained on human video consistently underperform those trained on robot data, even though human video is more abundant. The gap stems from ignoring active perception—how humans reposition their viewpoint during manipulation. ActiveMimic recovers camera and wrist trajectories from egocentric RGB video, models viewpoint repositioning as an action, and jointly learns manipulation and active perception. Real-world robotic experiments show it matches robot-pretrained baselines while using only human video, suggesting active perception is the missing link for scaling robot learning to unconstrained human footage.
Published as ActiveMimic: Egocentric Video Pretraining with Active Perception arXiv:2606.06194
Read the original paper →