How to learn online without revealing your preferences

Online learning algorithms must often pick actions (like ad placements) while protecting individual privacy via differential privacy. This work closes a theoretical gap by proving the first algorithm matching the conjectured optimal regret bound for pure-DP decision-making—splitting cost between the hardness of distinguishing close-performing actions and privacy's statistical tax. The method uses exponential-time blocks with random prefixes to convert privacy noise into learnable signal.