← Back to Machine Learning cs.LG
Why training on multiple answers per question improves language models
Hasan Amin, Kian Ahrabian, Ming Yin, Rajiv Khanna
May 30, 2026
Language models are typically fine-tuned with one response per prompt, even though many questions have multiple correct answers. This creates a "mode lottery" where the model learns an incomplete view of valid outputs. The authors show that keeping multiple responses per prompt reduces prediction uncertainty about the output distribution—but only when prompts are already scarce. They prove random selection of K responses is unbiased, warn that reward-based selection causes mode collapse, and validate on new benchmarks that multi-response training improves generalization most in high-diversity, low-redundancy regimes.
Read the original paper →