← Back to Computer Vision
cs.CV

Do video generators actually understand cause and effect?

You-Zhe Xie, Yu-Hsuan Li, Jie-Ying Lee, Kaipeng Zhang, Yu-Lun Liu, Zhixiang Wang

May 28, 2026

Video diffusion models are improving rapidly, but researchers question whether they grasp causality or just memorize temporal patterns. YoCausal, a benchmark using reversed real-world videos as zero-cost counterfactuals, measures two things: whether models perceive time's arrow (Reverse Surprise Index) and whether they genuinely reason about cause-and-effect (Causality Cognition Index). Testing 13 top models shows a crucial disconnect—understanding time flow doesn't mean understanding causality, and all models lag far behind human reasoning.
Published as YoCausal: How Far is Video Generation from World Model? A Causality Perspective arXiv:2605.30346
Read the original paper →