← Back to Machine Learning
cs.LG

Can you predict self-distillation gains before training starts?

Tommy He, Jerome Sieber, Matteo Saponati

May 28, 2026

On-policy self-distillation (OPSD) uses rich feedback signals—beyond simple rewards—to improve language models, but predicting whether a configuration will work has required expensive full training runs. Researchers discovered a strikingly consistent linear correlation: the initial gap between a student model and its teacher self predicts the final performance gain across different context types and model sizes. This predictive law lets you estimate OPSD success upfront, enabling faster iteration and more efficient use of compute in RL post-training.
Published as A Predictive Law for On-Policy Self-Distillation From World Feedback arXiv:2605.30070
Read the original paper →