← Back to Robotics cs.RO
Teaching robots to walk and manipulate from human videos
Haoran Huang, Haonan Dong, Huixu Dong
May 20, 2026
Mobile robots learning from human demonstrations face two tangled problems: camera footage mixes walking with hand motion, and inference delays cause the moving base to drift from predicted positions. Mobile UMI solves this using two cameras (one on chest, one on wrist) recorded without a robot present, then mathematically separates base movement from arm motion using spatial anchoring. An online executor continuously realigns actions to the robot's actual pose before they execute, discarding outdated waypoints. On four household tasks, the system achieved 83.8% success—substantially better than prior approaches, with decoupled kinematics and latency correction each closing significant gaps.
Read the original paper →