← Back to Machine Learning
cs.LG

Can AI write better code by testing itself without human examples?

Zhangyi Hu, Chenhui Liu, Tian Huang, Jindong Li, Yang Yang, Jiemin Wu, Zining Zhong, Menglin Yang, Yutao Yue

May 22, 2026

Current methods for generating correct code require expensive ground-truth unit tests during training. CoSPlay sidesteps this by having the model generate both code candidates and test cases, then iteratively improve them together—using a two-way feedback loop where weak code fails tests and bad tests wrongly pass code. When multiple solutions tie, it picks code from the largest consensus cluster, since correct answers agree while wrong ones diverge. On four benchmarks, this nearly-training-free approach lifts accuracy from 22% to 33%, matching or beating models trained with human-written tests.
Published as CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test arXiv:2605.23491
Read the original paper →