← Back to Computation and Language cs.CL
Can random text trick language models into wrong answers?
Pawel Batorski, Abtin Pourhadi, Jerzy Sarosiek, Przemyslaw Spurek, Paul Swoboda
May 28, 2026
LLMs respond not just to task instructions but to semantically irrelevant text—"spurious prompts." Researchers discovered these random additions can improve performance, sometimes outperforming carefully tuned prompts, while also reliably steering models to produce biased answers (always picking option A, returning even numbers) without explicit instruction. Tested on models from 0.8B to 27B parameters, this vulnerability suggests LLMs exploit shallow statistical patterns rather than understanding task semantics.
Read the original paper →