Preventing AI agents from leaking secrets through shared memory

Sadia Asif, Mohammad Mohammadi Amiri, Momin Abbas, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy

When AI agents coordinate through shared transformer key-value caches, they inadvertently expose sensitive information—not through text, but through the hidden representations themselves. LCGuard learns to transform these cache artifacts before sharing, using adversarial training where one model tries to reconstruct leaked secrets while another blocks it. The approach cuts information leakage substantially while maintaining task performance across multiple model families and benchmarks.