← Back to Robotics
cs.RO

Finding hidden objects by reasoning about invisible spaces

Posheng Chen, Powen Cheng, Gueter Josmy Faure, Hung-Ting Su, Winston H. Hsu

May 14, 2026

SceneFunRI introduces a benchmark for inferring locations of invisible objects in 3D scenes based on task instructions and commonsense knowledge. Built on SceneFun3D with 855 instances, it frames the problem as 2D spatial reasoning—requiring models to predict where an object should be despite it being out of view. Gemini 3 Flash, the strongest tested baseline, achieves only 15.20% coordinate accuracy. The authors analyze three prompting strategies: instruction-based, reasoning-based, and spatial elimination. Results show that invisible-region reasoning remains a weak point in current vision-language models, requiring better integration of task intent, spatial grounding, and uncertainty estimation.
Published as SceneFunRI: Reasoning the Invisible for Task-Driven Functional Object Localization arXiv:2605.14704
Read the original paper →