Why judging demos beats searching for perfect ones

Haochun Wang, Chaofen Yang, Jiatong Liu, Jingbo Wang, Zewen Qiang, Sendong Zhao, Bing Qin, Ting Liu

In-context learning heavily depends on which examples you show the model, but finding optimal ones is computationally brutal. This work flips the problem: rather than search through combinations, train lightweight classifiers to judge whether a given query-demo pair will succeed. DiSP stratifies queries by difficulty, runs random trials to calibrate success rates, then uses a router and level-specific judges to accept suitable demonstrations under a budget constraint. On five classification tasks with Llama and Qwen models, it beats learned selection baselines by up to 3.4% accuracy while achieving 23× wall-clock speedup.