← Back to Artificial Intelligence
cs.AI

Can AI agents write their own tools reliably?

Yifan Zhou, Zhentao Zhang, Ziming Cheng, Shuo Zhang, Qizhen Lan, Zhangquan Chen, Zhi Yang, QianyuXu, Ronghao Chen, Huacan Wang, Sen Hu

May 18, 2026

Building effective AI agents requires not just using existing tools but generating new ones from raw materials. SkillGenBench isolates skill generation as its own problem, testing whether language models can synthesize executable skills from software repositories and long-form documents. The benchmark covers two scenarios: task-specific skills written after seeing a task, and reusable skill libraries built blindly before tasks arrive. Early results show current methods struggle significantly with skill reusability, especially when distilling procedures from documentation.
Published as SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents arXiv:2605.18693
Read the original paper →