← Back to Artificial Intelligence cs.AI
How to detect poisoned memories in AI agents after the damage is done?
Zhewen Tan, Yilun Yao, Huiyan Jin, Wenhan Yu, Guoan Wang, Mengyuan Fan, liang lu, Feng Liu, Xiangzheng Zhang, Duohe Ma, Tong Yang, Lin Sun
May 22, 2026
LLM agents that store and retrieve past interactions are vulnerable to adversaries who sneak malicious records into memory. MemAudit detects which stored memories caused harmful behavior after the fact by measuring each memory's causal influence on outputs and flagging structurally odd entries. Tested against realistic memory-poisoning attacks, it eliminated success rates that previously reached 83% in reasoning tasks.
Read the original paper →