Can AI agents fool CAPTCHA tests like humans do?

Xinhao Song, Su Su, Sirui Song, Hongliang Wu, Wen Shen, Zhihua Wei, Gongshen Liu, Linfeng Zhang, Dongrui Liu

CAPTCHAs guard sensitive workflows—account creation, form submission, access control—by verifying humans, not bots. This benchmark tests whether multimodal agents can cross that boundary through grounded interaction rather than pure recognition. Evaluating eight frontier models in a closed-loop GUI environment, researchers found agents remain brittle: performance varies sharply across CAPTCHA types, degrades under realistic webpage clutter, and collapses when solutions must include valid action traces. The work exposes failures in localization, action calibration, and state tracking—concrete gaps preventing human substitution in protected workflows.