Teaching robots reward functions that work beyond the training lab

Tengye Xu, Yangting Sun, Ziju Shen, Guanqi Chen, Zhen Fu, Chen yizhou, Hua Chen, Jia Pan

Vision-based reward functions for robots fail when objects move or cameras shift because they memorize pixel patterns. This work learns invariant rewards by discovering task-level properties (like object-to-goal distance) rather than fitting visual features. Using only five demonstrations and no online interaction, the method produces rewards that accelerate policy learning and generalize across real-world variations—same reward works for different objects and viewpoints across Meta-World and Franka manipulation tasks.