← Back to Computation and Language
cs.CL

Can AI catch research flaws that human reviewers miss?

Yashwardhan Chaudhuri, Sanyam Jain, Paridhi Mundra

May 26, 2026

E3 is an automated review assistant that flags decision-relevant technical problems in research papers—unsupported claims, missing ablations, weak baselines, validity threats—and explains what evidence would resolve each issue. Tested on 100 ICLR 2026 papers using a clean backtesting protocol that avoids data contamination, E3 catches 90.2% of issues (partial-inclusive) versus 60.8% for human reviewers, and surfaces 406 additional concerns the ICLR panel missed entirely. Code, corpus, and evaluation templates are open.
Published as E3: Issue-Level Backtesting for Automated Research Critique arXiv:2605.27072
Read the original paper →