← Back to Computation and Language
cs.CL

Do language models think alike when their answers match?

Muhammad Usama, Dong Eui Chang

May 22, 2026

Across 16 language models (1.5B–72B parameters) solving 800 reasoning problems, representations align most on questions all models get wrong, not right. Post-decision activations diverge sharply even when early layers match, and shared information rarely causally influences predictions (1.5–5.5% flip rate under ablation). Convergence reflects common input processing, not shared reasoning logic.
Published as Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning arXiv:2605.23315
Read the original paper →