← Back to Computation and Language
cs.CL

How to make language models remember long documents faster?

Jinnan Yang, Yan Wang, Zhen Bi, Kehao Wu, Xiaojie Li, Jungang Lou, Zechao Li, Jing Liu

May 30, 2026

Diffusion language models are slow at processing long documents because they cache all tokens equally—wasting compute on irrelevant context. WaveFilter borrows from human reading: it decomposes long sequences using wavelet transforms to identify which tokens actually matter, then builds a sparse cache from only those. Tested on complex long-context tasks, it works as a plug-and-play layer that speeds up existing caching methods.
Published as WaveFilter: Enhancing the Long-Context Capability of Diffusion LLMs via Wavelet-Guided KV Cache Filtering arXiv:2606.00724
Read the original paper →