← Back to Machine Learning (Statistics) stat.ML
How to train simulators that don't fool reinforcement learning agents
Christoph Dann, Yishay Mansour, Mehryar Mohri
May 27, 2026
Model-based RL agents exploit tiny simulator inaccuracies, causing learned policies to fail in the real world. Rather than chasing predictive accuracy, the authors frame simulator learning as a game where a model defends against an adversarial policy. They prove the problem is learnable with sublinear regret, show how to simplify it via critic loss bounds, and reveal an Error-MDP duality that connects to standard RL. Experiments show this catches errors where policies matter most, letting purely simulated training match real-world performance.
Read the original paper →