How to collect better data for testing recommendation systems offline

Off-policy evaluation lets companies compare recommendation algorithms without live deployment, but its accuracy depends critically on how the data-collection (logging) policy is designed — a problem largely ignored in practice. This paper formalizes a reward-coverage tradeoff: focusing data collection on high-reward actions reduces variance but can leave gaps where the candidate policy might act. The authors derive theoretically optimal logging policies for three real-world scenarios — when the target policy and rewards are known, unknown, or only partially known — and distill practical design principles for cases where the optimum is operationally infeasible. The work is primarily theoretical but directly actionable for firms running offline experimentation at scale.