Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning
Researchers introduce GRIP, a unified framework that integrates retrieval decisions directly into language model generation through control tokens, eliminating the need for external retrieval controllers. The system enables models to autonomously decide when to retrieve information, reformulate queries, and terminate retrieval within a single autoregressive process, achieving competitive performance with GPT-4o while using substantially fewer parameters.
GRIP represents a significant architectural shift in how language models interact with external knowledge sources. Rather than treating retrieval as a separate pipeline component, the framework embeds retrieval control into the token-level decoding process itself. This unified approach allows models to jointly optimize retrieval and reasoning through what the authors call Self-Triggered Information Planning, enabling dynamic decision-making about when and what information to retrieve during generation.
The advancement addresses a fundamental limitation in current retrieval-augmented generation systems: the reliance on external classifiers or controllers to determine when retrieval occurs. Traditional RAG systems typically use fixed retrieval triggers or separate modules to manage information seeking, creating coordination overhead and potential bottlenecks. GRIP eliminates this architectural dependency by making retrieval control an integral part of the generation process itself.
For the AI development community, this work has practical implications for building more efficient question-answering systems. The framework demonstrates competitive results against GPT-4o across five QA benchmarks while maintaining parameter efficiency, suggesting that careful architectural design can match or exceed larger model performance. The structured training approach covering answerable, partially answerable, and multi-hop queries provides a reusable methodology for other researchers developing similar systems.
The research signals a broader trend toward more integrated, end-to-end approaches in language model design rather than modular pipeline architectures. As models become more capable of managing their own information-seeking behavior, future systems may achieve better performance through tighter coupling of reasoning and retrieval components, potentially reducing the need for external intervention during inference.
- →GRIP integrates retrieval control directly into token-level generation, eliminating the need for external classifiers or controllers.
- →Self-Triggered Information Planning enables models to autonomously decide when to retrieve, reformulate queries, and terminate retrieval in a single pass.
- →The framework achieves competitive performance with GPT-4o while using substantially fewer parameters across five QA benchmarks.
- →Structured training on answerable, partially answerable, and multi-hop queries provides a replicable methodology for building similar systems.
- →The unified architecture enables dynamic multi-step inference with on-the-fly evidence integration without additional architectural overhead.