← Back to Computation and Language
cs.CL

Five lines of code expose what your language model really learned

Hisashi Miyashita

May 21, 2026

Singular value decomposition of an LLM's output layer weights exposes what the model learned directly from its parameters. Analyzing GPT-OSS-120B, Gemma, and Qwen revealed systematic differences: GPT has hierarchical semantic organization, Gemma is dominated by 19th-century English, and Qwen contains ethically problematic subspaces that survive alignment training. The technique also detects glitch tokens like the notorious shokubutsu-hyakka-tsu token without running the model, suggesting SVD analysis should become standard safety auditing before release.
Published as Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have) arXiv:2605.22005
Read the original paper →