← Back to Computation and Language
cs.CL

When debate between AI agents breeds groupthink instead of better answers

Xiqi Hao, Zengqing Wu, Yu-Xuan Qiu, Chuan Xiao, Ruiqi Xu, Shuyuan Zheng, Jianbin Qin

May 30, 2026

When multiple language models debate a problem and reach agreement, it looks like collaborative reasoning—but might just be herd behavior. Researchers decomposed answer convergence into three mechanisms: random model instability (37%), social conformity (29%), and actual persuasion by reasoning. They found that even nonsensical "reasoning" convinces resistant models 20–39% of the time, and harmful conformity is predictable from early signals (AUC 0.79). Targeting interventions reduced bad conformity by 13.6 points, but without ground truth, suppressing peer influence backfires—the system can't tell beneficial from harmful agreement.
Published as Not All Flips Are Conformity: Decomposing Stance Convergence in Multi-Agent LLM Debate arXiv:2606.00820
Read the original paper →