← Back to Computation and Language cs.CL
Can agents learn practical skills from messy web instructions?
Xinyu Che, Junqi Xiong, Yunfei Ge, Xinping Lei, Shihao Li, Hang Yan, Han Li, Yuanxing Zhang, Zhiqi Bai, Jinhua Hao, Ming Sun, Han Li, Jiaheng Liu
June 1, 2026
Web instructions are written for humans, not AI agents—they're scattered across formats, incomplete, and noisy. This work tackles guide-to-skill learning: automatically converting procedural knowledge into structured skills, then continuously refining them from agent trajectories. MMG2Skill compiles guides into editable instructions, feeds them to vision-language models during tasks, and improves them based on failure patterns rather than ground-truth labels. Tested on GUI control, game play, and card strategy across six different backbones, the system consistently outperforms agents using raw guides, with the biggest gains (25 points) coming from both structured skill design and trajectory-driven updates.
Read the original paper →