How AI detectors fool us about who's actually using them

Researchers tested how AI-detection benchmarks measure language tool use in journal abstracts across countries and academic fields. A pooled detection method systematically misidentified natural stylistic differences as AI-generated text, overestimating adoption in some regions by large margins. Using field-specific and country-specific baselines instead produced far more accurate results, suggesting that crude, one-size-fits-all AI detection distorts reality and creates unfair comparisons between nations and disciplines.