← Back to Machine Learning (Statistics)
stat.ML

How to train simulators that don't fool reinforcement learning agents

Christoph Dann, Yishay Mansour, Mehryar Mohri

May 27, 2026

Model-based RL agents exploit tiny simulator inaccuracies, causing learned policies to fail in the real world. Rather than chasing predictive accuracy, the authors frame simulator learning as a game where a model defends against an adversarial policy. They prove the problem is learnable with sublinear regret, show how to simplify it via critic loss bounds, and reveal an Error-MDP duality that connects to standard RL. Experiments show this catches errors where policies matter most, letting purely simulated training match real-world performance.
Published as Theoretical Foundations and Effective Algorithms for Policy-Aware Simulator Learning arXiv:2605.29032
Read the original paper →