Teaching self-driving cars to rank their own motion plans

End-to-end driving planners typically learn from a single logged trajectory but are evaluated against multi-objective metrics for safety, feasibility, progress, and comfort—creating a fundamental mismatch. CLOVER addresses this with a generator–scorer architecture: a generator produces diverse candidate trajectories using set-level coverage supervision, while a scorer predicts planning-metric sub-scores to rank them at inference. The method uses conservative closed-loop self-distillation, where the scorer trains on true evaluator scores and the generator refines toward top-performing and Pareto-optimal targets. On NAVSIM, CLOVER achieves 94.5 PDMS and 90.4 EPDMS (state of the art), with code to be released.