← Back to Computer Vision
cs.CV

Teaching AI to reason like a detective about photo locations

Yong Li, Furong Jia, Dacheng Yin, Kang Rong, Fengyun Rao, Jing Lyu, Fan Zhang

May 26, 2026

Image geo-localization—figuring out where a photo was taken—normally requires humans to inspect clues, form hypotheses, search for evidence, and revise. REVERSE trains agents to replicate this multi-turn reasoning by learning three decisions: where to look, what to query, and which evidence to trust. Using process rewards and an offline search cache, a 4B model outperforms retrieval-augmented baselines and rivals much larger models on Im2GPS3k and YFCC4k benchmarks. Code released.
Published as REVERSE: Reinforcing Evidence Verification and Search for Agentic Image geo-localization arXiv:2605.26861
Read the original paper →