Can language models learn to create their own shortcuts?

Zhongyu He, Yuanfan Li, Fei Huang, Tianyu Chen, Siyuan Chen, Xingyang Li, Meng Hsuan Yu, Xiangrong Liu, Leyi Wei, Lu Pan, Ke Zeng, Xunliang Cai

LLM agents struggle with long-horizon tasks because they lack reusable skills, yet existing methods require external skill generators or large skill banks at inference. SIRI sidesteps both by letting agents mine skills from their own successful trajectories, validate what works, and compress useful patterns directly into the policy. On WebShop and ALFWorld, this approach beats baselines that use skills, reaching 81.3% on WebShop—faster and simpler because inference runs on the original prompt alone.