← Back to Computer Vision cs.CV
Teaching AI to reason like a detective about photo locations
Yong Li, Furong Jia, Dacheng Yin, Kang Rong, Fengyun Rao, Jing Lyu, Fan Zhang
May 26, 2026
Image geo-localization—figuring out where a photo was taken—normally requires humans to inspect clues, form hypotheses, search for evidence, and revise. REVERSE trains agents to replicate this multi-turn reasoning by learning three decisions: where to look, what to query, and which evidence to trust. Using process rewards and an offline search cache, a 4B model outperforms retrieval-augmented baselines and rivals much larger models on Im2GPS3k and YFCC4k benchmarks. Code released.
Read the original paper →