Why agent systems fail as skill libraries grow—and how to fix it

Charles Chen, Qiming Yu, Yuhang Gu, Zhuoye Huang, Hanjing Li, Hongyu Liu, Simin Liu, Jinhao Liu, Dengyun Peng, Jiangyi Wang, Zheng Yan, Fanqing Meng, Ethan Qin, Carl Che, Mengkang Hu

This paper identifies fundamental scaling laws governing how large language model agents perform as their skill libraries grow. Across 15 models and over 3 million real decisions on 1,141 practical skills, the authors show that routing accuracy (picking the right skill) decays logarithmically with library size, while execution quality can partially compensate for routing errors. A single parameter couples these two laws, allowing predictions of downstream recoverability from routing metrics. Applying these insights to optimize skill libraries raises benchmark performance from 49.3% to 61.6% on ClawBench and 28.4% to 34.5% on ClawMark. Intended for researchers building production agent systems and practitioners deploying LLMs with tool use.