← Back to Computer Vision cs.CV
Do spatial foundation models actually work everywhere?
Haosong Peng, Hao Li, Jiaqi Chen, Yuhao Pan, Runmao Yao, Yalun Dai, Fushuo Huo, Fangzhou Hong, Zhaoxi Chen, Haozhao Wang, Dingwen Zhang, Ziwei Liu, Wenchao Xu
May 26, 2026
Most spatial foundation models shine on their test sets but stumble when facing different viewpoints, scene types, or data densities. SpatialBench—a 19-dataset benchmark spanning 5 domains—evaluates 41 models across 546 scenes to measure true generalization. Key finding: models need full-context attention for accuracy and bounded memory for long sequences, but dataset quality and domain alignment matter far more than pure scale. Authors release DA-Next-5M dataset and DA-Next baseline model.
Read the original paper →