← Back to Artificial Intelligence
cs.AI

Why RAG systems say they see risks but ignore them anyway

Zhe Yu, Wenpeng Xing, Chen Ye, Xuyang Teng, Bo Yang, Changting Lin, Meng Han

May 26, 2026

Retrieval-augmented LLMs show a dangerous gap: they detect conflicting evidence but don't use it to change their recommendations. Testing across four model families (1.5B–32B parameters) over multiple turns reveals single-turn safety evaluations overestimate real-world robustness by 3–4×. Models internally represent and attend to danger signals yet fail to act on them—a monitoring-control gap that no prompt engineering fixes.
Published as Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs arXiv:2605.27157
Read the original paper →