Why do language models choose certain words? A probability-based answer

Shilpika Shilpika, Carlo Graziani, Bethany Lusch, Venkatram Vishwanath, Michael E. Papka

Large language models generate text by sampling from probability distributions over tokens. This work inverts those probabilities using Bayes rule to create a attribution score that shows which input tokens pushed the model toward each output word—independent of the model's architecture. The measure reveals where models are uncertain or unstable during generation, offering a tool to understand what LLMs actually learned and where they're unreliable.