cs.AI

How to detect poisoned memories in AI agents after the damage is done?

Zhewen Tan, Yilun Yao, Huiyan Jin, Wenhan Yu, Guoan Wang, Mengyuan Fan, liang lu, Feng Liu, Xiangzheng Zhang, Duohe Ma, Tong Yang, Lin Sun

May 22, 2026

LLM agents that store and retrieve past interactions are vulnerable to adversaries who sneak malicious records into memory. MemAudit detects which stored memories caused harmful behavior after the fact by measuring each memory's causal influence on outputs and flagging structurally odd entries. Tested against realistic memory-poisoning attacks, it eliminated success rates that previously reached 83% in reasoning tasks.

Published as MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection arXiv:2605.23723

Read the original paper →