← Back to Machine Learning (Statistics)
stat.ML

How to learn strategies against intelligent opponents when you can't see everything?

Raman Arora

June 1, 2026

Learning to play games where both you and your opponent have incomplete information is hard: you must infer hidden dynamics while adapting to an adversary whose moves depend on your strategy. Arora proves an optimistic maximum-likelihood algorithm reaches √T policy regret—the best possible up to logarithmic factors—with explicit bounds tied to problem structure like hidden state complexity and opponent memory. The approach uses confidence sets built over growing epochs to keep comparison costs manageable.
Published as Minimax-Optimal Policy Regret in Partially Observable Markov Games arXiv:2606.02363
Read the original paper →