Measuring whether AI videos obey physics

Evaluating whether generative video models produce physically coherent 3D worlds has relied on subjective human judgment or learned metrics. PDI-Bench introduces an objective framework that reconstructs 3D coordinates from generated videos using segmentation and point tracking, then measures geometric failures across three dimensions: scale-depth alignment, motion consistency, and structural rigidity. Testing on state-of-the-art video generators reveals consistent geometry-specific failure modes invisible to standard perceptual metrics like LPIPS or FVD. The authors release both code and PDI-Dataset, a diverse benchmark designed to stress physical constraints.