Why do video generators forget physics after two steps?

Woojung Han, Seil Kang, Youngjun Jun, Min-Hung Chen, Fu-En Yang, Seong Jae Hwang

Video generation models that start from images often violate physics as denoising progresses, even though early steps produce more realistic motion. The culprit: the model's internal phase information degrades by ~18% during the 50-step denoising process while visual details sharpen. PhaseLock locks in the correct motion from just 2 steps and enforces it throughout generation via a training-free guidance method, improving physical consistency by 6.2 points across models with minimal overhead and without expensive external constraints.