Editing videos in seconds without retraining the model

Guanlong Jiao, Chenyangguang Zhang, Jia Jun Cheng Xian, Zewei Zhang, Renjie Liao

Video editing typically requires many costly iterations to produce good results. StreamGVE flips the approach: instead of iteratively refining from the original video, it generates from noise while anchoring to the source footage—the same way modern image generators work. Using dual-branch sampling and attention mechanisms to blend source conditions with generation, the method delivers high-quality edits in minimal steps without retraining. Works across different pre-trained models and handles diverse editing tasks.