← Back to Computation and Language cs.CL
Backdoor attacks that hide in position, not text
Rui Wen, Mark Russinovich, Andrew Paverd, Jun Sakuma, Ahmed Salem
May 14, 2026
This paper reveals that Transformer-based LLMs are vulnerable to backdoor attacks that exploit positional information instead of modifying textual content. MetaBackdoor uses length-correlated triggers to activate hidden behaviors—including disclosure of system prompts and unintended tool calls—without any suspicious text in the input. The attack works on semantically clean inputs and can compose with content-based backdoors for compound triggers. Existing defenses focused on detecting malicious text cannot detect these position-based attacks, exposing a previously unexamined vulnerability in how modern LLMs process sequential information.
Read the original paper →