← Back to Computer Vision
cs.CV

Can multiple robots reason about shared space together?

Kunyu Peng, Zhikun Zhou, Kailun Yang, Di Wen, Ruiping Liu, Yufan Chen, Junwei Zheng, Hao Shi, Yi Zhou, M. Saquib Sarfraz, Danda Pani Paudel, Luc Van Gool

May 18, 2026

MLLMs excel at understanding single video streams, but coordinating knowledge across multiple robots seeing the same scene simultaneously is unexplored. Researchers introduce CoopSR, a benchmark with 114k QA pairs across simulated and real robot teams, plus SP-CoR, a framework that fuses egocentric views using physics-guided reasoning. The model learns from robot position data during training but works with only camera feeds at test time, reaching 7% better accuracy than baselines on real quadruped robots.
Published as Seeing Together:Multi-Robot Cooperative Egocentric Spatial Reasoning with Multimodal Large Language Models arXiv:2605.18431
Read the original paper →