Can you predict self-distillation gains before training starts?

On-policy self-distillation (OPSD) uses rich feedback signals—beyond simple rewards—to improve language models, but predicting whether a configuration will work has required expensive full training runs. Researchers discovered a strikingly consistent linear correlation: the initial gap between a student model and its teacher self predicts the final performance gain across different context types and model sizes. This predictive law lets you estimate OPSD success upfront, enabling faster iteration and more efficient use of compute in RL post-training.