ICICLE: Expanding Retrieval with In-Context Documents
Researchers introduce ICICLE, a generative retrieval framework that addresses the inefficiency of traditional corpus expansion by treating new documents as in-context evidence rather than requiring model retraining. The approach uses a copy-based routing mechanism to distinguish between parametric memory and context-provided document associations, achieving better scalability without catastrophic forgetting.
ICICLE represents a meaningful shift in how generative retrieval systems handle dynamic document corpora. Traditional generative retrieval relies on encoding document-identifier associations directly into model parameters, creating a fundamental scaling bottleneck: adding new documents demands expensive retraining cycles and risks degrading performance on previously indexed materials through catastrophic forgetting. This limitation has constrained practical deployment in real-world systems where document collections continuously expand.
The framework reframes corpus expansion as an inference-time problem by supplying new documents as contextual evidence alongside queries. This approach mirrors how human information retrieval works—consulting newly available references without restructuring foundational knowledge. ICICLE's architecture combines three key mechanisms: a copy-based routing system that decides whether to draw from parametric memory or in-context documents, preference-based calibration to improve selection accuracy, and large context adaptation to handle varying document set sizes.
Experimental results on MS MARCO and NQ320K datasets demonstrate that ICICLE maintains retrieval performance on previously seen documents while improving handling of newly introduced materials—eliminating the retraining requirement entirely. The analysis identifies routing failures as the primary cause of performance degradation at scale, revealing that source-selection calibration becomes increasingly critical as context expands.
For the broader AI/ML infrastructure landscape, this work addresses a genuine production challenge. Systems managing large evolving document collections—enterprise search platforms, knowledge bases, recommendation engines—currently face difficult tradeoffs between freshness and stability. ICICLE's in-context approach opens pathways toward more dynamic, efficient systems that can incorporate new information without full model updates, potentially accelerating adoption of generative retrieval in commercial applications.
- →ICICLE eliminates costly retraining cycles by treating new documents as inference-time context rather than updating model parameters
- →Copy-based routing mechanism distinguishes between parametric memory and in-context document associations to prevent performance degradation
- →System maintains performance on previously indexed documents while improving retrieval of newly introduced materials
- →Routing calibration emerges as the critical bottleneck for scaling in-context generative retrieval with larger document sets
- →Framework enables more dynamic corpus expansion suitable for enterprise search and evolving knowledge base applications