← Back to Computation and Language
cs.CL

Do AI chatbots actually know the news they report?

Mirac Suzgun, Emily Shen, Federico Bianchi, Alexander Spangher, Thomas Icard, Daniel E. Ho, Dan Jurafsky, James Zou

May 21, 2026

Researchers tested six commercial AI chatbots (GPT-5, Claude, Gemini, Grok) against 2,100 real news questions from BBC coverage across six languages and regions. The systems aced multiple-choice (90%+ accuracy) but stumbled badly on free-response answers, and crashed to 19–70% accuracy when questions contained false premises. The core finding: retrieval failures, not reasoning gaps, cause most errors—and every model performs worst on Hindi, with strong English-language bias in their underlying sources.
Published as Evaluating Commercial AI Chatbots as News Intermediaries arXiv:2605.22785
Read the original paper →