← Back to Machine Learning
cs.LG

How LLM agents can escape their comfortable habits to discover better strategies

Yibo Li, Jiashuo Yang, Zhi Zheng, Zhiyuan Hu, Yuan Sui, Shizun Wang, Yufei He, Bryan Hooi

May 20, 2026

LLM agents that learn during episodes often fall into exploration collapse—they stick with familiar high-reward actions and stop trying alternatives. APEX maintains an explicit map of strategies as a directed graph with prerequisite dependencies, using Fork Discovery to surface evidence-grounded new directions and Policy Selection to balance trying proven versus novel approaches. On text-adventure games and web interaction tasks, APEX beats all baselines and proves each component matters.
Published as APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents arXiv:2605.21240
Read the original paper →