Testing whether AI agents actually remember visual details

Minghao Guo, Qingyue Jiao, Zeru Shi, Yihao Quan, Boxuan Zhang, Danrui Li, Liwei Che, Wujiang Xu, Shilong Liu, Zirui Liu, Mubbasir Kapadia, Vladimir Pavlovic, Jiang Liu, Mengdi Wang, Yiyu Shi, Dimitris N. Metaxas, Ruixiang Tang

MemEye is an evaluation framework that measures whether AI agents with memory truly retain and use visual information for reasoning, rather than relying on captions alone. The framework tests memory along two dimensions: granularity of visual evidence (scene-level to pixel-level) and complexity of reasoning required (single evidence to multi-step synthesis). The authors created a benchmark across 8 real-world scenario tasks with validation gates to prevent shortcut answers and confirm visual necessity. Testing 13 memory methods across 4 vision-language models shows current approaches struggle with fine-grained detail preservation and temporal reasoning about state changes.