Why some teacher feedback helps models learn, others don't

Yuanyi Wang, Su Lu, Yanggan Gu, Pengkai Wang, Yifan Yang, Zhaoyi Yan, Congkai Xie, Jianmin Wu, Hongxia Yang

On-policy distillation trains student models on their own outputs using teacher feedback, but not all disagreement signals help learning equally. This work shows that raw KL divergence conflates two types: learnable disagreement (teacher corrects within the student's top candidates) and incompatible disagreement (teacher's preferred tokens are off the student's radar). By measuring local compatibility—"token teachability"—the authors propose TA-OPD, which selects only 5% of tokens for training and still outperforms full-token distillation on Qwen models.