← Back to Computer Vision
cs.CV

Can video generators learn physics from their own training data?

Bo Jiang, Depu Meng, Yihan Hu, Yichen Xie, Tianshuo Xu, Wei Zhan

May 22, 2026

Video diffusion models generate visually smooth clips but fail at physics—objects move implausibly, violate momentum, collide wrongly. LaMo trains on the same unlabeled videos used for generation, learning latent motion patterns as a self-supervised signal. During generation, it applies two plug-and-play components: a motion drift loss during training and motion guidance during sampling. On physics benchmarks (VideoPhy, VideoPhy2), it outperforms physics-aware baselines that need external supervision or teacher models, while maintaining visual quality on general benchmarks.
Published as LaMo: Self-Supervised Latent Motion Priors for Physical Realism in Video Generation arXiv:2605.23878
Read the original paper →