← Back to Artificial Intelligence cs.AI
Separating forget and remember in fast attention mechanisms
Ali Hatamizadeh, Yejin Choi, Jan Kautz
May 21, 2026
Linear attention speeds up transformers by replacing their expensive attention cache with a fixed-size memory that updates in constant time. The challenge: editing this compressed memory without corrupting what's already stored. Gated DeltaNet-2 decouples two operations—erasing old content and writing new content—with separate learnable gates per channel, generalizing prior approaches (Gated DeltaNet and Kimi Delta Attention). Testing on 100B tokens, it outperforms Mamba-2/3 on language modeling, reasoning, and especially long-context retrieval benchmarks, with code released.
Read the original paper →