← Back to Computation and Language
cs.CL

Training language models on their own outputs without external feedback

Guangya Hao, Yitong Shang, Yunbo Long, Zhuokai Zhao, Hanxue Liang

May 21, 2026

Self-distillation trains language models on their own generated outputs, but existing methods either need expensive external feedback or struggle to generalize. This work proposes extracting a low-rank capability subspace from the model's gradients, using it to filter activations during generation, then fine-tuning on the raw outputs. Across code, math, and QA tasks, this achieves 13–16% gains over prior self-distillation methods without any external signals, and generalizes 15% better to out-of-domain settings.
Published as Self-Policy Distillation via Capability-Selective Subspace Projection arXiv:2605.22675
Read the original paper →