← Back to Computation and Language cs.CL
How to train multi-agent LLM teams end-to-end?
Yiqun Chen, Wei Yang, Erhan Zhang, Shijie Wang, Qi Liu, Zechun Niu, Bin Zhang, Haitao Li, Rui Li, Lingyong Yan, Jinyuan Feng, Biqing Qi, Xiaochi Wei, Yan Gao, Yi Wu, Yao Hu, Jiaxin Mao
May 26, 2026
Most multi-agent LLM systems rely on hand-crafted prompts and rules rather than learning. UnityMAS-O treats the whole workflow—agents, tools, interactions—as a single trainable unit, letting you define agent roles, reward structures, and parameter sharing without rebuilding optimization code. On question-answering and code generation, RL training improves smaller models substantially over manually specified baselines, suggesting this approach could become a standard way to develop multi-agent systems.
Read the original paper →