← Back to Computer Vision cs.CV
Can video generators learn physics from their own training data?
Bo Jiang, Depu Meng, Yihan Hu, Yichen Xie, Tianshuo Xu, Wei Zhan
May 22, 2026
Video diffusion models generate visually smooth clips but fail at physics—objects move implausibly, violate momentum, collide wrongly. LaMo trains on the same unlabeled videos used for generation, learning latent motion patterns as a self-supervised signal. During generation, it applies two plug-and-play components: a motion drift loss during training and motion guidance during sampling. On physics benchmarks (VideoPhy, VideoPhy2), it outperforms physics-aware baselines that need external supervision or teacher models, while maintaining visual quality on general benchmarks.
Read the original paper →