Does scaling work for self-driving perception?

Yingwei Li, Xin Huang, Yang Liu, Yang Fu, Alex Zihao Zhu, Chen Song, Junwen Yao, Anant Subramanian, Hao Xiang, Weijing Shi, Yuliang Zou, Tom Hoddes, Zhaoqi Leng, Govind Thattai, Dragomir Anguelov, Mingxing Tan

Autonomous driving perception systems face a unique challenge: fusing multiple sensor types (cameras, LiDAR, radar) while understanding 3D space. STELLAR tests whether the scaling approach that worked for large language models also works here. The team trained a Sparse Window Transformer on 50 million driving examples with up to 500M parameters and discovered clear scaling laws—performance improves predictably with model size and data. The result: new state-of-the-art on Waymo's open dataset, with substantial margins over prior systems.