← Back to Computation and Language cs.CL
How to make language models remember long documents faster?
Jinnan Yang, Yan Wang, Zhen Bi, Kehao Wu, Xiaojie Li, Jungang Lou, Zechao Li, Jing Liu
May 30, 2026
Diffusion language models are slow at processing long documents because they cache all tokens equally—wasting compute on irrelevant context. WaveFilter borrows from human reading: it decomposes long sequences using wavelet transforms to identify which tokens actually matter, then builds a sparse cache from only those. Tested on complex long-context tasks, it works as a plug-and-play layer that speeds up existing caching methods.
Read the original paper →