← Back to Computation and Language
cs.CL

How low-precision transformers achieve real computational power

Moritz Brösamle, Stephan Eckstein

May 18, 2026

Prior expressivity results for transformers required unrealistic assumptions: hardmax attention, high-precision arithmetic, or expensive architectural modifications. This work shows that standard transformer decoders with softmax attention and rounded activations/weights can compute anything a Turing machine can, as long as depth and width scale logarithmically with context length. The authors construct hardmax transformers using Chain-of-Thought to simulate Turing machines, then convert them to softmax equivalents without requiring extreme precision. They also analyze a recently proposed summarized CoT approach, showing it uses model size scaling logarithmically in space rather than time. Empirical validation on Sudoku reasoning tasks better predicts learnability than prior high-precision results. Code is released.
Published as The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought arXiv:2605.18079
Read the original paper →