← Back to Computation and Language cs.CL
Why language models fail when context matters most
Francesco Corielli
May 22, 2026
Language models learn from text alone, but actual communication depends on invisible context: intentions, facts, social setting, task constraints. This paper formalizes when next-token prediction succeeds—only when observed text is a sufficient statistic for all relevant circumstances. When that fails (the typical case), models hallucinate or err predictably. RAG and tool use fix this by reintroducing missing context. The framework explains why training on heterogeneous corpora creates brittle systems.
Read the original paper →