When does hiding old information help search agents?

Haoxiang Zhang, Qixin Xu, Zhuofeng Li, Lei Zhang, Pengcheng Jiang, Yu Zhang, Julian McAuley

Search agents accumulate massive context from repeated tool calls, wasting tokens on information they never revisit. Masking old observations seems obvious—but across 4B to 284B parameter models, it follows a strange inverted-U curve: useless with weak retrievers, peak benefit at medium model capacity, total failure with large models. The mechanism: masking trades tokens for extra turns, helping when agents are stuck but hurting when they'd have used the masked evidence. Context management isn't one-size-fits-all; effectiveness depends on the regime where retriever strength meets model capacity.