← Back to Machine Learning cs.LG
Can you predict self-distillation gains before training starts?
Tommy He, Jerome Sieber, Matteo Saponati
May 28, 2026
On-policy self-distillation (OPSD) uses rich feedback signals—beyond simple rewards—to improve language models, but predicting whether a configuration will work has required expensive full training runs. Researchers discovered a strikingly consistent linear correlation: the initial gap between a student model and its teacher self predicts the final performance gain across different context types and model sizes. This predictive law lets you estimate OPSD success upfront, enabling faster iteration and more efficient use of compute in RL post-training.
Read the original paper →