← Back to Machine Learning
cs.LG

Teaching machines to recognize protein shapes at scale

Dexiong Chen, Andrei Manolache, Mathias Niepert, Karsten Borgwardt

May 18, 2026

Protein fold classification—sorting proteins by their 3D shape—matters for understanding biology, but researchers lacked large, clean benchmarks to train on. This work introduces TEDBench, a non-redundant benchmark built from AlphaFold predictions and experimental structures, and proposes Masked Invariant Autoencoders (MiAE), which learns protein structure representations by masking up to 90% of the input and reconstructing coordinates. MiAE outperforms supervised baselines on TEDBench and transfers better to real experimental protein structures, suggesting self-supervision is more efficient for learning protein topology than traditional supervised training.
Published as Protein Fold Classification at Scale: Benchmarking and Pretraining arXiv:2605.18552
Read the original paper →