🧠 AI⚪ NeutralImportance 6/10

In-Context Multiple Instance Learning

arXiv – CS AI|Alexander M\"ollers, Marvin Sextro, Julius Hense, Gabriel Dernbach, Klaus-Robert M\"uller|June 5, 2026 at 04:00 AM

🤖AI Summary

Researchers propose an in-context learning approach for Multiple Instance Learning (MIL) using Perceiver-style architecture pretrained on synthetic data, enabling models to solve new tasks with minimal labeled examples. The method outperforms supervised baselines across twelve benchmarks while requiring no task-specific training at inference time.

Analysis

This research addresses a fundamental challenge in machine learning: performing well with limited labeled data. Multiple Instance Learning problems occur frequently in real-world applications like medical imaging and satellite analysis, where obtaining bag-level labels is easier than instance-level supervision. However, existing MIL algorithms struggle when training data is scarce—flexible models overfit while rigid approaches fail to generalize.

The proposed solution leverages in-context learning, a paradigm popularized by large language models, applying it to structured bag data through a Perceiver architecture. By pretraining on diverse synthetic data generators rather than real task-specific data, the model learns transferable patterns applicable to novel MIL problems. This approach distinguishes itself by requiring only a single forward pass at inference, eliminating the need for gradient-based adaptation.

The key innovation lies in using multiple complementary synthetic data generators. Each generator encodes different inductive biases about bag structure, and combining them creates a model that inherits their collective strengths. Testing across twelve benchmarks demonstrates superior performance compared to supervised baselines that require task-specific fine-tuning.

For the broader machine learning community, this work suggests in-context learning extends beyond language domains into structured, bag-based problems. The efficiency gains from single-pass inference without gradient updates have practical implications for deployment scenarios with computational constraints. The synthetic-pretraining strategy also offers potential cost savings by reducing reliance on expensive labeled data collection.

Key Takeaways

→In-context learning with Perceiver architecture enables effective few-shot Multiple Instance Learning without task-specific training
→Synthetic data pretraining using diverse generators creates models with complementary inductive biases and superior generalization
→The approach achieves inference in a single forward pass without gradient updates, offering computational efficiency advantages
→Performance across twelve MIL benchmarks outperforms supervised baselines despite requiring minimal labeled data
→The method addresses the low-label regime common in real-world applications like medical imaging and satellite analysis