← Back to Artificial Intelligence
cs.AI

Can search agents improve themselves without external help?

Zihan Liang, Yufei Ma, Ben Chen, Zhipeng Qian, Xuxin Zhang, Huangyu Dai, Lingtao Mao

May 21, 2026

Recent work on search-augmented reasoning agents stacks multiple training tricks: external supervisors, reward models, tree search, hand-tuned bonuses. Search-E1 asks whether all this complexity is necessary. It replaces the machinery with vanilla GRPO (policy gradient) plus offline self-distillation: after each training round, the model generates its own examples and learns from better versions of its own trajectories. On seven QA benchmarks, this minimal approach reaches 44% average accuracy with a 3B model, outperforming larger open-source baselines. Code is releasing soon.
Published as Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning arXiv:2605.22511
Read the original paper →