← Back to Computer Vision cs.CV
Generating long videos without retraining the model
Jangho Park, Geon Yeong Park, Gihyun Kwon, Jong Chul Ye
May 20, 2026
Video diffusion models struggle to generate beyond their training length. This work proposes FlowLong, an inference-time method that stitches together sliding windows of video by matching predictions from overlapping regions (Tweedie matching) and re-injecting noise strategically to stay on the learned manifold. The approach works with any existing model, requires no retraining, and produces temporally coherent videos several times longer than native window length while beating autoregressive baselines that accumulate drift errors.
Read the original paper →