← Back to Computer Vision cs.CV
Teaching self-driving cars to think ahead and follow instructions
Yang Wu, Qiang Meng, Zhaojiang Liu, Youquan Liu, Jian Yang, Jin Xie
May 20, 2026
Current end-to-end autonomous driving models hit a wall: they can only imitate human behavior. CoPhy adds two missing pieces: a cognitive layer that understands traffic semantics and intent (distilled from a vision-language model with zero inference cost), and a forward-looking world model that predicts how the car's actions will unfold. The system uses dual rewards—physical safety from simulated rollouts and cognitive alignment from language—to train the driving policy. Results on NAVSIM benchmarks show state-of-the-art performance plus explicit safety guarantees and the ability to follow natural language instructions.
Read the original paper →