Why robots struggle when multiple good actions look identical

Lorenzo Mazza, Massimiliano Datres, Ariel Rodriguez, Sebastian Bodenstedt, Gitta Kutyniok, Stefanie Speidel

Teaching robots by imitation fails when the same visual state allows many valid actions. This work maps exactly how two common policy designs break under multimodality: latent-variable models either collapse to a single mode or lose the ability to distinguish between them, while generative models struggle because smooth mappings cannot cover many separated solutions. Testing on robotic tasks confirms these failure modes are fundamental, not accidental.