Why autonomous cars don't need to think out loud

Driving VLAs typically use natural language reasoning as an intermediate step—but generating and parsing long chains of thought is slow and requires expensive annotations. DriveMA instead uses concise one-step meta-actions (like "accelerate" or "prepare_turn") derived automatically from expert driving data. Combined with reinforcement learning that jointly optimizes action correctness and trajectory quality, the approach reaches state-of-the-art on Waymo End-to-End Driving with a 2B model. The trade-off: simpler instructions that are faster to infer, easier for compact models to learn, and more reliable than reasoning chains—without sacrificing driving performance.