Do video models understand physics or just memorize patterns?

Video models are pitched as paths to general world understanding, but CRONOS—a new intervention-based benchmark—shows they don't actually grasp physics. Built in photorealistic Unreal Engine, it tests whether models predict the same physical event (collision, occlusion, fall) correctly when you change viewpoint, scene, object appearance, or category. Recent open-source generators consistently fail: prediction quality drops when viewpoint shifts or objects look different, even for identical underlying physics. Dataset and code released.