What if robots learned from every moment, not just successes?

Michael Matthews, Matthew Jackson, Michael Beukman, Thomas Foster, Alistair Letcher, Scott Fujimoto, Cédric Colas, Jakob Foerster

Goal-conditioned agents typically waste most observations by updating only toward the commanded goal. This work enables "all-goals learning"—using every transition to improve performance on every possible objective—by having a single neural network jointly output values and actions for all goals in parallel. On Craftax environments, LEO dramatically outperforms competitors while running 250× faster than naive relabelling; it also matches or beats existing methods on continuous control. Code is released.