← Back to Computer Vision
cs.CV

Can a model learn MIL tasks from just a few examples?

Alexander Möllers, Marvin Sextro, Julius Hense, Gabriel Dernbach, Klaus-Robert Müller

June 4, 2026

Multiple instance learning—where you label groups rather than individual items—typically requires substantial labeled data. This work trains a Perceiver-based model on synthetic bag-structured data, then tests whether it can solve new MIL tasks from only a handful of labeled examples at inference time via in-context learning. The pretrained model, trained on a mixture of synthetic generators, outperforms supervised baselines across 12 benchmarks without gradient updates or task-specific tuning, addressing the low-label bottleneck in pathology and satellite imagery.
Published as In-Context Multiple Instance Learning arXiv:2606.06458
Read the original paper →