← Back to Computation and Language cs.CL
Training language models on their own outputs without external feedback
Guangya Hao, Yitong Shang, Yunbo Long, Zhuokai Zhao, Hanxue Liang
May 21, 2026
Self-distillation trains language models on their own generated outputs, but existing methods either need expensive external feedback or struggle to generalize. This work proposes extracting a low-rank capability subspace from the model's gradients, using it to filter activations during generation, then fine-tuning on the raw outputs. Across code, math, and QA tasks, this achieves 13–16% gains over prior self-distillation methods without any external signals, and generalizes 15% better to out-of-domain settings.
Read the original paper →