Why overfitting language models sometimes makes them better

Hyperfitting—fine-tuning language models to near-zero training loss on small datasets—produces surprisingly diverse outputs, but the mechanism isn't temperature scaling or vocabulary reweighting. Using entropy-matched experiments and layer-wise analysis, researchers show the effect is a geometric expansion in the final transformer block that dynamically reorders token rankings based on context. They introduce Late-Stage LoRA, updating only the last 5 layers, achieving the same diversity gains with minimal parameters.