← Back to Artificial Intelligence
cs.AI

Does the order of training data change what language models learn?

Pilchen Hippolyte, Fabre Romain, Signe Talla Franck, Perez Patrick, Grave Edouard

May 21, 2026

Language models are typically trained on shuffled data, leaving their grasp of time-sensitive facts unclear. Researchers trained 6B-parameter models on temporally ordered Common Crawl snapshots versus standard shuffled pre-training, then evaluated them on 7,000+ time-grounded questions. Sequential training matched shuffled baselines on general language skills but consistently retrieved more recent and temporally accurate facts—while shuffled training peaked on older data. Code, checkpoints, and datasets are released.
Published as Understanding Data Temporality Impact on Large Language Models Pre-training arXiv:2605.22769
Read the original paper →