← Back to Robotics cs.RO
Teaching robots to find things they can't see yet
Pengteng Li, Weiyu Guo, He Zhang, Tiefu Cai, Xiao He, Yandong Guo, Hui Xiong
May 21, 2026
Most robotic vision-language models fail when task objects leave the camera's field of view, forcing reactive behavior. SOMA adds a persistent spatial memory built from multi-view scans with a movable head camera, letting robots reason about objects they can't currently see. On five real-world manipulation tasks—including dual-arm scenarios—SOMA achieves faster target localization and one-shot grasping under partial visibility.
Read the original paper →