Why combining the best tabular models barely helps

Aditya Tanna, Yash Desai, Pratinav Seth, Mohamed Bouadi, Nassim Bouarour, Vinay Kumar Sankarapu

Tabular foundation models now rival gradient boosting, but no single one wins across all datasets—suggesting ensembling should help. Testing six TFMs on 153 datasets reveals they're near-redundant: their predictions correlate so tightly that most ensemble strategies fail to improve over the best individual model. Stacking's meta-learner actually harms calibration by over-sharpening boundaries. The practical takeaway: greedy selection beats expensive ensemble methods.