← Back to Computation and Language cs.CL
Teaching models to reason through long documents without getting lost
Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li
May 29, 2026
Language models struggle to find and use key facts buried in long documents. This work trains them using reinforcement learning with rubric rewards—fine-grained signals that track whether the model mentions correct entities along its reasoning chain. The trick: they build harder training problems by injecting "high confusability" distractors (documents the search agent read but didn't use) alongside easy ones. On five benchmarks, the approach consistently beats baselines and produces more thorough, evidence-backed answers. Code and models released.
Read the original paper →