← Back to Computer Vision cs.CV
Why AI video gets stuck at the first frame—and how to fix it
Yusuf Dalva, Pinar Yanardag
May 28, 2026
Video diffusion models generate frame-by-frame by anchoring to the first frame's representation, which dominates attention and locks the scene in place. This dampens motion, camera movement, and scene evolution. Researchers replace this static anchor with an adaptive latent state that evolves at each generation step, treating time as relative rather than absolute. The model now attends to both previous state and current content to build its own reference dynamically, enabling substantially more natural video progression without external modules.
Read the original paper →