← Back to Computation and Language cs.CL
A transformer trained on millions of patient records predicts disease progression
Yunying Zhu, Andrew R Weckstein, Kueiyu Joshua Lin, Jie Yang
May 14, 2026
DT-Transformer addresses the gap between research cohorts and real-world clinical deployment by training on longitudinal electronic health records from an entire health system rather than curated datasets or single hospitals. The model predicts disease trajectories—which conditions a patient will develop next—using structured EHR data across 1.7M patients from Mass General Brigham. In prospective validation, next-event prediction achieved median AUC 0.871 across 896 disease categories, with all categories exceeding random performance. This work demonstrates that health system-scale training produces models that reflect real-world clinical complexity better than smaller, curated datasets.
Read the original paper →