🧠 AI⚪ NeutralImportance 6/10

Can In-Context Learning Support Intrinsic Curiosity?

arXiv – CS AI|Eric Elmoznino, Sangnie Bhardwaj, Johannes von Oswald, Rajai Nasser, Blaise Ag\"uera y Arcas, Jo\~ao Sacramento, Rif A. Saurous, Guillaume Lajoie|June 19, 2026 at 04:00 AM

🤖AI Summary

Researchers demonstrate that large language models' in-context learning capabilities can efficiently support intrinsic curiosity mechanisms for automated data collection, though with important theoretical limitations. The work proves this approach works for non-temporal settings like active learning but fails for general sequential decision problems without computational shortcuts.

Analysis

This research addresses a fundamental challenge in machine learning: how to automatically select which data to collect rather than relying on passive datasets. Traditional curiosity-driven learning rewards agents based on 'learning progress'—how much new observations improve a model's predictive ability—but computing these rewards requires expensive gradient descent loops that don't scale. The authors propose leveraging in-context learning (ICL), the ability of large sequence models to adapt to new tasks within a single forward pass, as an update-free alternative that could eliminate this computational bottleneck.

The theoretical contribution proves both positive and negative results. For general Markov decision processes (the typical framing for sequential decision-making), using ICL-derived rewards is fundamentally flawed—estimated learning progress becomes biased by nuisance terms unrelated to true model improvement. However, for non-temporal settings encompassing active learning and Bayesian experimental design, ICL successfully bounds and converges to true learning progress. This distinction matters because it clarifies where this approach provides genuine advantages versus where it merely appears to work.

The practical implications center on scalability. If ICL-based curiosity proves reliable in its applicable domains, it could enable large-scale automated data collection without prohibitive computational costs. The experimental validation across continuous and symbolic environments suggests real-world viability. For the AI development community, this bridges an important gap between theoretical curiosity mechanisms and practical implementation constraints. The negative result for temporal settings highlights that no universal shortcut exists, preventing false hope that in-context learning solves all data-selection challenges.

Key Takeaways

→In-context learning can efficiently compute curiosity rewards for non-temporal settings without expensive inner optimization loops.
→Theoretical analysis proves ICL-based rewards fail for general sequential decision processes due to uncontrollable bias terms.
→The approach succeeds for active learning and Bayesian experimental design, demonstrating positive convergence guarantees.
→Experimental results across multiple environments confirm ICL-driven policies achieve optimal exploration in applicable domains.
→The work clarifies fundamental boundaries between where in-context learning shortcuts are viable versus where traditional methods remain necessary.