← Back to Artificial Intelligence
cs.AI

Separating forget and remember in fast attention mechanisms

Ali Hatamizadeh, Yejin Choi, Jan Kautz

May 21, 2026

Linear attention speeds up transformers by replacing their expensive attention cache with a fixed-size memory that updates in constant time. The challenge: editing this compressed memory without corrupting what's already stored. Gated DeltaNet-2 decouples two operations—erasing old content and writing new content—with separate learnable gates per channel, generalizing prior approaches (Gated DeltaNet and Kimi Delta Attention). Testing on 100B tokens, it outperforms Mamba-2/3 on language modeling, reasoning, and especially long-context retrieval benchmarks, with code released.
Published as Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention arXiv:2605.22791
Read the original paper →