← Back to Computation and Language
cs.CL

Why AI misreads ancient artifacts with modern eyes

Mukul Ranjan, Prince Jha, Khushboo Kumari, Zhiqiang Shen

May 14, 2026

Vision-language models struggle with temporal reasoning when interpreting cultural heritage, applying modern frameworks to historical artifacts in ways that distort meaning. The authors introduce TAB-VLM, a benchmark of 600 questions across Indian cultural artifacts from prehistoric to modern periods, revealing that even GPT-4V and other leading models fail consistently. The gap persists regardless of model size or architecture, pointing to a fundamental blind spot in how VLMs are trained—likely due to underrepresentation of non-Western visual cultures in training data. The benchmark and code are released to help future work improve temporal cognition in multimodal systems.
Published as On the Cultural Anachronism and Temporal Reasoning in Vision Language Models arXiv:2605.15071
Read the original paper →