← Back to Computer Vision
cs.CV

Generating long videos without retraining the model

Jangho Park, Geon Yeong Park, Gihyun Kwon, Jong Chul Ye

May 20, 2026

Video diffusion models struggle to generate beyond their training length. This work proposes FlowLong, an inference-time method that stitches together sliding windows of video by matching predictions from overlapping regions (Tweedie matching) and re-injecting noise strategically to stay on the learned manifold. The approach works with any existing model, requires no retraining, and produces temporally coherent videos several times longer than native window length while beating autoregressive baselines that accumulate drift errors.
Published as FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching arXiv:2605.20910
Read the original paper →