← Back to Computer Vision
cs.CV

Do spatial foundation models actually work everywhere?

Haosong Peng, Hao Li, Jiaqi Chen, Yuhao Pan, Runmao Yao, Yalun Dai, Fushuo Huo, Fangzhou Hong, Zhaoxi Chen, Haozhao Wang, Dingwen Zhang, Ziwei Liu, Wenchao Xu

May 26, 2026

Most spatial foundation models shine on their test sets but stumble when facing different viewpoints, scene types, or data densities. SpatialBench—a 19-dataset benchmark spanning 5 domains—evaluates 41 models across 546 scenes to measure true generalization. Key finding: models need full-context attention for accuracy and bounded memory for long sequences, but dataset quality and domain alignment matter far more than pure scale. Authors release DA-Next-5M dataset and DA-Next baseline model.
Published as SpatialBench: Is Your Spatial Foundation Model an All-Round Player? arXiv:2605.27367
Read the original paper →