← Back to Computer Vision cs.CV
Faster video generation with frame-by-frame diffusion distillation
Min Zhao, Hongzhou Zhu, Kaiwen Zheng, Zihan Zhou, Bokai Yan, Xinyuan Li, Xiao Yang, Chongxuan Li, Jun Zhu
May 14, 2026
Real-time interactive video generation demands low latency and streaming capabilities. This work tackles frame-wise autoregressive generation with minimal sampling steps (1–2), identifying student model initialization as the critical bottleneck. Causal Forcing++ uses causal consistency distillation to learn from single online teacher steps rather than precomputed trajectories, cutting initialization cost and training time by ~4×. On VBench benchmarks, the 2-step method surpasses prior 4-step approaches (0.3 higher quality, 0.335 higher reward) while cutting first-frame latency in half. Code and project materials are available.
Read the original paper →