What transformers actually compute vs. what they represent

Transformers trained on base-digit extraction (e.g., finding the coefficient of B^D in N's base-B representation) achieve 99.83% accuracy and appear to implement the closed-form algorithm. Linear probes successfully decode intermediate values that match this solution, but causal circuit analysis shows the model doesn't actually use them—it routes information through separate, late-combining pathways instead. The work demonstrates that internal representations and causal computation can diverge sharply, even with explicit algorithmic ground truth available.