How to train simulators that don't fool reinforcement learning agents

Model-based RL agents exploit tiny simulator inaccuracies, causing learned policies to fail in the real world. Rather than chasing predictive accuracy, the authors frame simulator learning as a game where a model defends against an adversarial policy. They prove the problem is learnable with sublinear regret, show how to simplify it via critic loss bounds, and reveal an Error-MDP duality that connects to standard RL. Experiments show this catches errors where policies matter most, letting purely simulated training match real-world performance.