← Back to Computation and Language
cs.CL

How to train multi-agent LLM teams end-to-end?

Yiqun Chen, Wei Yang, Erhan Zhang, Shijie Wang, Qi Liu, Zechun Niu, Bin Zhang, Haitao Li, Rui Li, Lingyong Yan, Jinyuan Feng, Biqing Qi, Xiaochi Wei, Yan Gao, Yi Wu, Yao Hu, Jiaxin Mao

May 26, 2026

Most multi-agent LLM systems rely on hand-crafted prompts and rules rather than learning. UnityMAS-O treats the whole workflow—agents, tools, interactions—as a single trainable unit, letting you define agent roles, reward structures, and parameter sharing without rebuilding optimization code. On question-answering and code generation, RL training improves smaller models substantially over manually specified baselines, suggesting this approach could become a standard way to develop multi-agent systems.
Published as UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems arXiv:2605.26646
Read the original paper →