← Back to Artificial Intelligence cs.AI
Can search agents improve themselves without external help?
Zihan Liang, Yufei Ma, Ben Chen, Zhipeng Qian, Xuxin Zhang, Huangyu Dai, Lingtao Mao
May 21, 2026
Recent work on search-augmented reasoning agents stacks multiple training tricks: external supervisors, reward models, tree search, hand-tuned bonuses. Search-E1 asks whether all this complexity is necessary. It replaces the machinery with vanilla GRPO (policy gradient) plus offline self-distillation: after each training round, the model generates its own examples and learns from better versions of its own trajectories. On seven QA benchmarks, this minimal approach reaches 44% average accuracy with a 3B model, outperforming larger open-source baselines. Code is releasing soon.
Read the original paper →