← Back to Computer Vision
cs.CV

Teaching self-driving cars to think ahead and follow instructions

Yang Wu, Qiang Meng, Zhaojiang Liu, Youquan Liu, Jian Yang, Jin Xie

May 20, 2026

Current end-to-end autonomous driving models hit a wall: they can only imitate human behavior. CoPhy adds two missing pieces: a cognitive layer that understands traffic semantics and intent (distilled from a vision-language model with zero inference cost), and a forward-looking world model that predicts how the car's actions will unfold. The system uses dual rewards—physical safety from simulated rollouts and cognitive alignment from language—to train the driving policy. Results on NAVSIM benchmarks show state-of-the-art performance plus explicit safety guarantees and the ability to follow natural language instructions.
Published as Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving arXiv:2605.21139
Read the original paper →