Can multiple robots reason about shared space together?

Kunyu Peng, Zhikun Zhou, Kailun Yang, Di Wen, Ruiping Liu, Yufan Chen, Junwei Zheng, Hao Shi, Yi Zhou, M. Saquib Sarfraz, Danda Pani Paudel, Luc Van Gool

MLLMs excel at understanding single video streams, but coordinating knowledge across multiple robots seeing the same scene simultaneously is unexplored. Researchers introduce CoopSR, a benchmark with 114k QA pairs across simulated and real robot teams, plus SP-CoR, a framework that fuses egocentric views using physics-guided reasoning. The model learns from robot position data during training but works with only camera feeds at test time, reaching 7% better accuracy than baselines on real quadruped robots.