Training robots to imagine beyond what they've seen

Shaojun Xu, Xiaoling Zhou, Yihan Lin, Yapeng Meng, Xinglong Ji, Luping Shi, Rong Zhao

Model-based reinforcement learning typically limits imagination to continuations of observed states, creating a mismatch between what the world model learns and what the policy optimizes. Mind Dreamer breaks this constraint by using an adversarial generator to sample initial states for imagination that are physically plausible but cognitively novel—jumping to underexplored regions of the latent state space. The method introduces Relay Value and Uncertainty Functions to propagate value and epistemic information across these discontinuities, with a theoretical guarantee that uncertainty requires quadratic discounting. On DeepMind Control Suite benchmarks, it achieves 1.67× average speedup over DreamerV3, and up to 8.8× in sparse-reward settings.