How one robot controller handles joints, hands, and human poses

Zuxing Lu, Ziang Zheng, Yao Lyu, Jingyu Liu, Feihong Zhang, Song Lu, Xin Yuan, Changyin Sun, Xingxing Zuo, Shengbo Eben Li

Humanoid robots struggle when tasks demand different motion references—joints for walking, hand positions for grasping. M3imic unifies these mismatched input types using separate encoders that feed a shared latent space, then trains one policy via reinforcement learning that transfers directly to real Unitree G1 robots. The approach handles joint angles, human poses, and end-effector targets without task-specific retuning, achieving 98% success in simulation and validated on hardware.