Do AI chatbots actually know the news they report?

Mirac Suzgun, Emily Shen, Federico Bianchi, Alexander Spangher, Thomas Icard, Daniel E. Ho, Dan Jurafsky, James Zou

Researchers tested six commercial AI chatbots (GPT-5, Claude, Gemini, Grok) against 2,100 real news questions from BBC coverage across six languages and regions. The systems aced multiple-choice (90%+ accuracy) but stumbled badly on free-response answers, and crashed to 19–70% accuracy when questions contained false premises. The core finding: retrieval failures, not reasoning gaps, cause most errors—and every model performs worst on Hindi, with strong English-language bias in their underlying sources.