← Back to Computer Vision cs.CV
Do video models understand physics or just memorize patterns?
León Begiristain, Olaf Dünkel, Adam Kortylewski
May 22, 2026
Video models are pitched as paths to general world understanding, but CRONOS—a new intervention-based benchmark—shows they don't actually grasp physics. Built in photorealistic Unreal Engine, it tests whether models predict the same physical event (collision, occlusion, fall) correctly when you change viewpoint, scene, object appearance, or category. Recent open-source generators consistently fail: prediction quality drops when viewpoint shifts or objects look different, even for identical underlying physics. Dataset and code released.
Read the original paper →