← Back to Machine Learning cs.LG
How to collect better data for testing recommendation systems offline
Connor Douglas, Joel Persson, Foster Provost
May 14, 2026
Off-policy evaluation lets companies compare recommendation algorithms without live deployment, but its accuracy depends critically on how the data-collection (logging) policy is designed — a problem largely ignored in practice. This paper formalizes a reward-coverage tradeoff: focusing data collection on high-reward actions reduces variance but can leave gaps where the candidate policy might act. The authors derive theoretically optimal logging policies for three real-world scenarios — when the target policy and rewards are known, unknown, or only partially known — and distill practical design principles for cases where the optimum is operationally infeasible. The work is primarily theoretical but directly actionable for firms running offline experimentation at scale.
Read the original paper →