← Back to Artificial Intelligence
cs.AI

Can foundation models help predict when models fail on new data?

Shuxuan Li, Zhilin Zhao, Quyu Kong, Wei-Shi Zheng

June 4, 2026

When models encounter data different from training, predicting their performance without labels is hard—existing methods rely only on the failing model itself. FRAP combines predictions from a foundation model and the target model, aligning them via temperature scaling and weighting by confidence to create a better performance proxy. Tests across multiple datasets and architectures show consistent, substantial improvements over baseline estimation methods.
Published as Bridging Domain Expertise and Generalization for Performance Estimation arXiv:2606.06335
Read the original paper →