← Back to Computer Vision
cs.CV

Can AI understand complex movie plots from video?

Zhengqian Wu, Zhixian Liu, Aodong Chen, Jingyang Zhang, Ruizhe Li, Hanlin Ge, Zhongyuan Wang, Chunxia Xiao, Chao Liang

June 4, 2026

Existing video question-answering systems handle simple factual queries but stumble when asked to track characters, motivations, and plot arcs over hours of content. Researchers built StoryVideoQA—363K automatically-generated QA pairs spanning TV shows and feature films up to 2 hours long—and benchmarked 20 leading models, showing they lose coherence tracking storylines. They propose PlotTree, which reorganizes video into hierarchical plot summaries to reason about complex narratives more reliably.
Published as StoryVideoQA: Scaling Deep Video Understanding with a Large-Scale, Multi-Genre and Auto-Generated Dataset arXiv:2606.06338
Read the original paper →