← Back to Robotics
cs.RO

Teaching robots to learn from human preferences, not commands

Yunyang Mo, Jian Li, Qiwei Wu, Yihang Kang, Renjing Xu

May 15, 2026

Robot learning via reinforcement learning struggles with unsafe, inefficient exploration in real-world deployment. OHP-RL reframes human interventions during training not as behavioral demonstrations to copy, but as preference signals indicating when autonomy should be guided. The method uses a state-dependent preference gate that learns when and how much human feedback should influence the policy, allowing robots to benefit from intermittent, imperfect guidance while maintaining stable learning. Evaluated on three contact-rich manipulation tasks with a Franka robot, OHP-RL achieved higher success rates, faster convergence, and required substantially fewer human interventions than prior methods.
Published as OHP-RL: Online Human Preference as Guidance in Reinforcement Learning for Robot Manipulation arXiv:2605.15971
Read the original paper →