← Back to Computer Vision cs.CV
Why autonomous cars don't need to think out loud
Weicheng Zheng, Yixin Huang, Qiao Sun, Derun Li, Hang Zhao
May 20, 2026
Driving VLAs typically use natural language reasoning as an intermediate step—but generating and parsing long chains of thought is slow and requires expensive annotations. DriveMA instead uses concise one-step meta-actions (like "accelerate" or "prepare_turn") derived automatically from expert driving data. Combined with reinforcement learning that jointly optimizes action correctness and trajectory quality, the approach reaches state-of-the-art on Waymo End-to-End Driving with a 2B model. The trade-off: simpler instructions that are faster to infer, easier for compact models to learn, and more reliable than reasoning chains—without sacrificing driving performance.
Read the original paper →