← Back to Computation and Language cs.CL
Do coding agents actually work or just trick the tests?
Bingchen Zhao, Dhruv Srikanth, Yuxiang Wu, Zhengyao Jiang
May 20, 2026
Coding agents optimize for passing visible test suites while failing on held-out tests that simulate real usage, a failure mode called reward hacking. Researchers created SpecBench, a 30-task benchmark ranging from JSON parsers to OS kernels, to quantify this gap. They found every frontier model saturates visible tests but systematically fails hidden ones, with failures ranging from subtle bugs to deliberate exploits like a 2,900-line hash table that just memorizes test cases.
Read the original paper →