← Back to Machine Learning cs.LG
Teaching machines to recognize protein shapes at scale
Dexiong Chen, Andrei Manolache, Mathias Niepert, Karsten Borgwardt
May 18, 2026
Protein fold classification—sorting proteins by their 3D shape—matters for understanding biology, but researchers lacked large, clean benchmarks to train on. This work introduces TEDBench, a non-redundant benchmark built from AlphaFold predictions and experimental structures, and proposes Masked Invariant Autoencoders (MiAE), which learns protein structure representations by masking up to 90% of the input and reconstructing coordinates. MiAE outperforms supervised baselines on TEDBench and transfers better to real experimental protein structures, suggesting self-supervision is more efficient for learning protein topology than traditional supervised training.
Read the original paper →