← Back to Computer Vision cs.CV
How to stop video generation from drifting off-track?
Qixin Hu, Shuai Yang, Wei Huang, Song Han, Yukang Chen
June 1, 2026
Long video generation with autoregressive diffusion models suffers from error accumulation—mistakes in early frames degrade everything that follows. LongLive-RAG treats previously generated frames as a searchable memory bank: instead of conditioning only on the recent sliding window, a lightweight retrieval step finds relevant historical frames to ground generation. A new training loss encourages the retrieval embeddings to capture meaningful temporal changes rather than local redundancy. Results across multiple models show improved video quality and ranking on VBench-Long. Code released.
Read the original paper →