← Back to Machine Learning cs.LG
How LLM agents can escape their comfortable habits to discover better strategies
Yibo Li, Jiashuo Yang, Zhi Zheng, Zhiyuan Hu, Yuan Sui, Shizun Wang, Yufei He, Bryan Hooi
May 20, 2026
LLM agents that learn during episodes often fall into exploration collapse—they stick with familiar high-reward actions and stop trying alternatives. APEX maintains an explicit map of strategies as a directed graph with prerequisite dependencies, using Fork Discovery to surface evidence-grounded new directions and Policy Selection to balance trying proven versus novel approaches. On text-adventure games and web interaction tasks, APEX beats all baselines and proves each component matters.
Read the original paper →