Do video generators actually understand cause and effect?

You-Zhe Xie, Yu-Hsuan Li, Jie-Ying Lee, Kaipeng Zhang, Yu-Lun Liu, Zhixiang Wang

Video diffusion models are improving rapidly, but researchers question whether they grasp causality or just memorize temporal patterns. YoCausal, a benchmark using reversed real-world videos as zero-cost counterfactuals, measures two things: whether models perceive time's arrow (Reverse Surprise Index) and whether they genuinely reason about cause-and-effect (Causality Cognition Index). Testing 13 top models shows a crucial disconnect—understanding time flow doesn't mean understanding causality, and all models lag far behind human reasoning.