Why benchmarks ignore the feature engineering that wins real competitions

Andrej Tschalzev, Nick Erickson, Yuyang Wang, Huzefa Rangwala, Stefan Lüdtke, Heiner Stuckenschmidt, Christian Bartelt

Tabular ML benchmarks test fancy models on raw data, ignoring feature engineering—the step that actually matters in practice. TabPrep adds lightweight pattern-specific generators (targeting structural quirks like interactions and logarithmic scales) and consistently boosts tree, neural, linear, and foundation models across TabArena. Often, the engineering beats the architecture. Code released.