Training language models on their own outputs without external feedback

Self-distillation trains language models on their own generated outputs, but existing methods either need expensive external feedback or struggle to generalize. This work proposes extracting a low-rank capability subspace from the model's gradients, using it to filter activations during generation, then fine-tuning on the raw outputs. Across code, math, and QA tasks, this achieves 13–16% gains over prior self-distillation methods without any external signals, and generalizes 15% better to out-of-domain settings.