← Back to Artificial Intelligence
cs.AI

A benchmark for keeping characters consistent across long AI-generated videos

Ruozhen He, Meng Wei, Ziyan Yang, Vicente Ordonez

May 14, 2026

Multi-shot video generation struggles to keep the same character, object, or location visually consistent as sequences grow longer. EntityBench provides 140 episodes (2,491 shots) drawn from real narrative media, with explicit per-shot entity tracking across easy, medium, and hard tiers reaching up to 50 shots and recurrence gaps of 48 shots. Evaluation separates intra-shot quality, prompt alignment, and cross-shot consistency, with a fidelity gate that only counts accurate entity appearances in consistency scores. The authors also release EntityMem, a memory-augmented generation system that pre-stores verified visual references per entity; code and data are publicly available on GitHub.
Published as EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation arXiv:2605.15199
Read the original paper →