Why how you tokenize graphs matters more than you think

Maya Bechler-Speicher, Gilad Yehudai, Gil Harari, Clayton Sanford, Amir Globerson, Joan Bruna

Transformers need graphs converted into tokens before processing, but the paper proves this choice isn't neutral: different tokenization schemes (spectral, random-walk, adjacency) create distinct computational regimes where the same task may be trivial for one approach yet impossible for another at shallow depths. Random-walk tokenization provably loses information irretrievably; spectral tokenization preserves everything but struggles with local patterns. A shallow transformer cannot efficiently convert between tokenization families, meaning a poor initial choice can't be fixed by the model itself. Experiments on synthetic and real tasks confirm the theory and show that mixing complementary tokenizations helps.