Why bigger language models remember facts better—and predictably so

Matthew L. Smith, Jonathan P. Shock, Samuel T. Segun, Iyiola E. Olatunji, Tegawendé F. Bissyandé

Large language models hallucinate confidently, but their ability to recall actual facts follows a mathematical pattern. Researchers tested 38 models on over 8,900 scholarly references and found that factual accuracy scales predictably with model size and training-data frequency: a bigger model learning about uncommon topics performs like a smaller model learning about common ones. The relationship follows a sigmoid curve in the combination of these two factors, suggesting recall works like a signal-to-noise problem—more frequent topics cut through the noise, larger models reduce the noise floor.