← Back to Computer Vision cs.CV
Keeping video characters consistent across long narratives
Jinzhuo Liu, Jiangning Zhang, Wencan Jiang, Yabiao Wang, Dingkang Liang, Zhucun Xue, Ran Yi, Yong Liu
May 18, 2026
Long autoregressive video generation struggles to maintain consistent character identities when prompts change, causing identity drift and attribute loss. IAMFlow addresses this by using an LLM to extract entities and assign persistent global IDs, paired with a VLM that verifies character attributes from rendered frames rather than relying on implicit similarity matching. An inference acceleration pipeline with asynchronous verification and quantization keeps computation practical. The authors also introduce NarraStream-Bench, a benchmark with 324 multi-prompt scripts and multimodal evaluation metrics. IAMFlow achieves the best overall performance without any training.
Read the original paper →