In-Context Multiple Instance Learning
Researchers propose an in-context learning approach for Multiple Instance Learning (MIL) using Perceiver-style architecture pretrained on synthetic data, enabling models to solve new tasks with minimal labeled examples. The method outperforms supervised baselines across twelve benchmarks while requiring no task-specific training at inference time.
This research addresses a fundamental challenge in machine learning: performing well with limited labeled data. Multiple Instance Learning problems occur frequently in real-world applications like medical imaging and satellite analysis, where obtaining bag-level labels is easier than instance-level supervision. However, existing MIL algorithms struggle when training data is scarce—flexible models overfit while rigid approaches fail to generalize.
The proposed solution leverages in-context learning, a paradigm popularized by large language models, applying it to structured bag data through a Perceiver architecture. By pretraining on diverse synthetic data generators rather than real task-specific data, the model learns transferable patterns applicable to novel MIL problems. This approach distinguishes itself by requiring only a single forward pass at inference, eliminating the need for gradient-based adaptation.
The key innovation lies in using multiple complementary synthetic data generators. Each generator encodes different inductive biases about bag structure, and combining them creates a model that inherits their collective strengths. Testing across twelve benchmarks demonstrates superior performance compared to supervised baselines that require task-specific fine-tuning.
For the broader machine learning community, this work suggests in-context learning extends beyond language domains into structured, bag-based problems. The efficiency gains from single-pass inference without gradient updates have practical implications for deployment scenarios with computational constraints. The synthetic-pretraining strategy also offers potential cost savings by reducing reliance on expensive labeled data collection.
- →In-context learning with Perceiver architecture enables effective few-shot Multiple Instance Learning without task-specific training
- →Synthetic data pretraining using diverse generators creates models with complementary inductive biases and superior generalization
- →The approach achieves inference in a single forward pass without gradient updates, offering computational efficiency advantages
- →Performance across twelve MIL benchmarks outperforms supervised baselines despite requiring minimal labeled data
- →The method addresses the low-label regime common in real-world applications like medical imaging and satellite analysis