Why RAG systems say they see risks but ignore them anyway

Retrieval-augmented LLMs show a dangerous gap: they detect conflicting evidence but don't use it to change their recommendations. Testing across four model families (1.5B–32B parameters) over multiple turns reveals single-turn safety evaluations overestimate real-world robustness by 3–4×. Models internally represent and attend to danger signals yet fail to act on them—a monitoring-control gap that no prompt engineering fixes.