← Back to Robotics
cs.RO

Teaching robots to find things they can't see yet

Pengteng Li, Weiyu Guo, He Zhang, Tiefu Cai, Xiao He, Yandong Guo, Hui Xiong

May 21, 2026

Most robotic vision-language models fail when task objects leave the camera's field of view, forcing reactive behavior. SOMA adds a persistent spatial memory built from multi-view scans with a movable head camera, letting robots reason about objects they can't currently see. On five real-world manipulation tasks—including dual-arm scenarios—SOMA achieves faster target localization and one-shot grasping under partial visibility.
Published as Spatial Memory for Out-of-Vision Manipulation in Vision-Language-Action arXiv:2605.22283
Read the original paper →