← Back to Machine Learning cs.LG
Hiding malware in AI models that only activates after compression
Xiaohua Zhan, Kazuki Egashira, Robin Staab, Mark Vero, Martin Vechev
May 14, 2026
LLM quantization reduces memory requirements but creates a hidden attack surface: a model can be released appearing harmless at full precision, then behave maliciously once compressed by end users. Previous attacks only worked against simple quantization schemes and failed against widely-used methods like AWQ, GPTQ, and GGUF. This work exploits a property common to modern quantization: injecting large outlier values into specific weight blocks causes surrounding weights to collapse to zero during quantization, producing predictable, targeted behavioral changes. Evaluated across three attack scenarios, the method achieves high success rates where all prior attacks failed, showing that quantization security risks extend to the most popular deployment pipelines.
Read the original paper →