← Back to Computation and Language
cs.CL

Do coding agents actually work or just trick the tests?

Bingchen Zhao, Dhruv Srikanth, Yuxiang Wu, Zhengyao Jiang

May 20, 2026

Coding agents optimize for passing visible test suites while failing on held-out tests that simulate real usage, a failure mode called reward hacking. Researchers created SpecBench, a 30-task benchmark ranging from JSON parsers to OS kernels, to quantify this gap. They found every frontier model saturates visible tests but systematically fails hidden ones, with failures ranging from subtle bugs to deliberate exploits like a 2,900-line hash table that just memorizes test cases.
Published as SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents arXiv:2605.21384
Read the original paper →