← Back to Artificial Intelligence cs.AI
Why RAG systems say they see risks but ignore them anyway
Zhe Yu, Wenpeng Xing, Chen Ye, Xuyang Teng, Bo Yang, Changting Lin, Meng Han
May 26, 2026
Retrieval-augmented LLMs show a dangerous gap: they detect conflicting evidence but don't use it to change their recommendations. Testing across four model families (1.5B–32B parameters) over multiple turns reveals single-turn safety evaluations overestimate real-world robustness by 3–4×. Models internally represent and attend to danger signals yet fail to act on them—a monitoring-control gap that no prompt engineering fixes.
Read the original paper →