← Back to Computation and Language
cs.CL

Why judging demos beats searching for perfect ones

Haochun Wang, Chaofen Yang, Jiatong Liu, Jingbo Wang, Zewen Qiang, Sendong Zhao, Bing Qin, Ting Liu

May 18, 2026

In-context learning heavily depends on which examples you show the model, but finding optimal ones is computationally brutal. This work flips the problem: rather than search through combinations, train lightweight classifiers to judge whether a given query-demo pair will succeed. DiSP stratifies queries by difficulty, runs random trials to calibrate success rates, then uses a router and level-specific judges to accept suitable demonstrations under a budget constraint. On five classification tasks with Llama and Qwen models, it beats learned selection baselines by up to 3.4% accuracy while achieving 23× wall-clock speedup.
Published as Easier to Judge than to Find: Predicting In-Context Learning Success for Demonstration Selection arXiv:2605.18512
Read the original paper →