← Back to Computer Vision
cs.CV

How to stop video generation from drifting off-track?

Qixin Hu, Shuai Yang, Wei Huang, Song Han, Yukang Chen

June 1, 2026

Long video generation with autoregressive diffusion models suffers from error accumulation—mistakes in early frames degrade everything that follows. LongLive-RAG treats previously generated frames as a searchable memory bank: instead of conditioning only on the recent sliding window, a lightweight retrieval step finds relevant historical frames to ground generation. A new training loss encourages the retrieval embeddings to capture meaningful temporal changes rather than local redundancy. Results across multiple models show improved video quality and ranking on VBench-Long. Code released.
Published as LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation arXiv:2606.02553
Read the original paper →