← Back to Computation and Language cs.CL
Five lines of code expose what your language model really learned
Hisashi Miyashita
May 21, 2026
Singular value decomposition of an LLM's output layer weights exposes what the model learned directly from its parameters. Analyzing GPT-OSS-120B, Gemma, and Qwen revealed systematic differences: GPT has hierarchical semantic organization, Gemma is dominated by 19th-century English, and Qwen contains ethically problematic subspaces that survive alignment training. The technique also detects glitch tokens like the notorious shokubutsu-hyakka-tsu token without running the model, suggesting SVD analysis should become standard safety auditing before release.
Read the original paper →