How toxic content hides in AI agent memory

Memory-augmented LLM agents store persistent context that shapes future behavior, but this introduces a safety vulnerability: toxic content can be summarized into seemingly benign memory that still propagates hostile framing downstream. The authors document this "memory laundering" effect using paired multi-agent rollouts and introduce the sub-threshold propagation gap (SPG) to measure hidden toxicity influence in outputs that would pass safety checks. Experiments show raw transcripts drive overt toxicity, while compressed memory carries covert influence, and crucially, sanitizing content before summarization substantially reduces propagation—whereas cleaning only the final summary leaves the laundered influence intact. The work reframes safety in memory-augmented agents as a state-control problem requiring upstream intervention.