When AI writes research papers: what works, what fails

Lingdong Kong, Xian Sun, Wei Chow, Linfeng Li, Kevin Qinghong Lin, Xuan Billy Zhang, Song Wang, Rong Li, Qing Wu, Wei Gao, Yingshuo Wang, Shaoyuan Xie, Jiachen Liu, Leigang Qu, Shijie Li, Lai Xing Ng, Benoit R. Cottereau, Ziwei Liu, Tat-Seng Chua, Wei Tsang Ooi

Fully automated systems can now generate research papers for $15, yet frontier LLMs still fabricate results, miss errors, and fail at scientific judgment. This roadmap analyzes AI across four phases—idea generation through dissemination—identifying where AI reliably assists versus where it fails dangerously. AI works well for retrieval-grounded tasks like literature review and structured writing, but remains fragile for genuinely novel ideas, research-level experiments, and peer review. End-to-end autonomous systems have not consistently reached major-venue acceptance. The paper provides a taxonomy, benchmarks, design principles, and a practitioner playbook showing human-governed collaboration as the most credible deployment model.