How to make language models remember long documents faster?

Jinnan Yang, Yan Wang, Zhen Bi, Kehao Wu, Xiaojie Li, Jungang Lou, Zechao Li, Jing Liu

Diffusion language models are slow at processing long documents because they cache all tokens equally—wasting compute on irrelevant context. WaveFilter borrows from human reading: it decomposes long sequences using wavelet transforms to identify which tokens actually matter, then builds a sparse cache from only those. Tested on complex long-context tasks, it works as a plug-and-play layer that speeds up existing caching methods.