← Back to Computation and Language
cs.CL

Teaching models to reason through long documents without getting lost

Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li

May 29, 2026

Language models struggle to find and use key facts buried in long documents. This work trains them using reinforcement learning with rubric rewards—fine-grained signals that track whether the model mentions correct entities along its reasoning chain. The trick: they build harder training problems by injecting "high confusability" distractors (documents the search agent read but didn't use) alongside easy ones. On five benchmarks, the approach consistently beats baselines and produces more thorough, evidence-backed answers. Code and models released.
Published as LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards arXiv:2605.31584
Read the original paper →