← Back to Computer Vision cs.CV
Why video AI can't tell left from right—and how to fix it
Jongseo Lee, Hyuntak Lee, Sunghun Kim, Sooa Kim, Jihoon Chung, Jinwoo Choi
May 21, 2026
Video-LLMs struggle with directional motion despite sophisticated architecture—most guess randomly on "did it move left or right" questions. The researchers traced where this breaks down: motion information exists in the model's hidden states but fails to bind to the correct answer. They created MoDirect, a diagnostic dataset, and DeltaDirect, a method that predicts 2D motion vectors from frame differences. The fix improves real-world motion direction accuracy by 22 points without requiring real-world training data, while maintaining performance on standard benchmarks.
Read the original paper →