Why foundation models fail at spotting building cracks

Nicola Farronato, Niccolo Avogaro, Thomas Frick, Mattia Rigotti, Rizwan Ullah Khan, Michele Magno, Konrad Schindler, Cristiano Malossi, Florian Scheidegger

Automated inspection of building cracks and structural damage remains unsolved despite recent advances in foundation models and vision language models. This paper introduces Cracks in the Foundation (CiF), the largest infrastructure segmentation dataset to date with ~150,000 meticulously annotated high-resolution images collected over five years with civil engineering experts. Evaluations show that zero-shot foundation models and even specialized segmentation models fail significantly on real-world infrastructure, with performance capping around 25% mAP. The work identifies specific algorithmic challenges—center-bias and reliance on shape over texture—that explain why models trained on internet images underperform on nearly textureless building materials. This dataset and benchmark establish civil infrastructure inspection as an open problem.