Teaching models to reason through long documents without getting lost

Language models struggle to find and use key facts buried in long documents. This work trains them using reinforcement learning with rubric rewards—fine-grained signals that track whether the model mentions correct entities along its reasoning chain. The trick: they build harder training problems by injecting "high confusability" distractors (documents the search agent read but didn't use) alongside easy ones. On five benchmarks, the approach consistently beats baselines and produces more thorough, evidence-backed answers. Code and models released.