← Back to Computation and Language
cs.CL

A transformer trained on millions of patient records predicts disease progression

Yunying Zhu, Andrew R Weckstein, Kueiyu Joshua Lin, Jie Yang

May 14, 2026

DT-Transformer addresses the gap between research cohorts and real-world clinical deployment by training on longitudinal electronic health records from an entire health system rather than curated datasets or single hospitals. The model predicts disease trajectories—which conditions a patient will develop next—using structured EHR data across 1.7M patients from Mass General Brigham. In prospective validation, next-event prediction achieved median AUC 0.871 across 896 disease categories, with all categories exceeding random performance. This work demonstrates that health system-scale training produces models that reflect real-world clinical complexity better than smaller, curated datasets.
Published as DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System arXiv:2605.14227
Read the original paper →