← Back to Computer Vision
cs.CV

Do video models understand physics or just memorize patterns?

León Begiristain, Olaf Dünkel, Adam Kortylewski

May 22, 2026

Video models are pitched as paths to general world understanding, but CRONOS—a new intervention-based benchmark—shows they don't actually grasp physics. Built in photorealistic Unreal Engine, it tests whether models predict the same physical event (collision, occlusion, fall) correctly when you change viewpoint, scene, object appearance, or category. Recent open-source generators consistently fail: prediction quality drops when viewpoint shifts or objects look different, even for identical underlying physics. Dataset and code released.
Published as CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models arXiv:2605.23699
Read the original paper →