← Back to Computation and Language cs.CL
Do language models think alike when their answers match?
Muhammad Usama, Dong Eui Chang
May 22, 2026
Across 16 language models (1.5B–72B parameters) solving 800 reasoning problems, representations align most on questions all models get wrong, not right. Post-decision activations diverge sharply even when early layers match, and shared information rarely causally influences predictions (1.5–5.5% flip rate under ablation). Convergence reflects common input processing, not shared reasoning logic.
Read the original paper →