← Back to Computation and Language cs.CL
Why judging demos beats searching for perfect ones
Haochun Wang, Chaofen Yang, Jiatong Liu, Jingbo Wang, Zewen Qiang, Sendong Zhao, Bing Qin, Ting Liu
May 18, 2026
In-context learning heavily depends on which examples you show the model, but finding optimal ones is computationally brutal. This work flips the problem: rather than search through combinations, train lightweight classifiers to judge whether a given query-demo pair will succeed. DiSP stratifies queries by difficulty, runs random trials to calibrate success rates, then uses a router and level-specific judges to accept suitable demonstrations under a budget constraint. On five classification tasks with Llama and Qwen models, it beats learned selection baselines by up to 3.4% accuracy while achieving 23× wall-clock speedup.
Read the original paper →