How confident can you be in a bandit algorithm's performance?

Bandit algorithms pick actions adaptively based on rewards, but their non-random data collection breaks standard statistical confidence intervals. BSI fits a simulator of the bandit environment from observed data, then uses it to estimate mean reward under any policy while formally propagating uncertainty. Works with weak exploration assumptions and maintains nominal coverage where off-policy methods fail.