How to learn strategies against intelligent opponents when you can't see everything?

Learning to play games where both you and your opponent have incomplete information is hard: you must infer hidden dynamics while adapting to an adversary whose moves depend on your strategy. Arora proves an optimistic maximum-likelihood algorithm reaches √T policy regret—the best possible up to logarithmic factors—with explicit bounds tied to problem structure like hidden state complexity and opponent memory. The approach uses confidence sets built over growing epochs to keep comparison costs manageable.