Disco-RAG: Discourse-Aware Retrieval-Augmented Generation
Researchers introduce Disco-RAG, a discourse-aware framework that enhances Retrieval-Augmented Generation (RAG) systems by explicitly modeling discourse structures and rhetorical relationships between retrieved passages. The method achieves state-of-the-art results on question answering and summarization tasks without fine-tuning, demonstrating that structural understanding of text significantly improves LLM performance on knowledge-intensive tasks.
Disco-RAG addresses a fundamental limitation in current RAG systems: their inability to leverage structural relationships between retrieved documents. While RAG has become essential for grounding LLMs in external knowledge, most implementations treat retrieved passages as isolated units, missing opportunities to synthesize information coherently. This research shows that explicit discourse modeling—capturing both local hierarchies within passages and cross-document rhetorical relationships—provides the missing structural context needed for superior generation quality.
The innovation builds on established principles from computational linguistics and discourse theory. Discourse structure analysis has long been central to natural language understanding, yet modern RAG systems largely ignore these insights when integrating retrieved content. Disco-RAG bridges this gap by constructing intra-chunk discourse trees for local organization and inter-chunk rhetorical graphs for global coherence, integrating both into a planning blueprint that conditions generation. This approach reflects a broader trend toward combining classical NLP techniques with neural methods.
For practitioners and organizations deploying RAG systems, this work demonstrates significant practical value. Systems handling question answering and long-document summarization—common enterprise applications—could achieve measurably better results without expensive fine-tuning or additional training data. The zero-shot performance gains suggest the approach generalizes across domains. Developers building knowledge-intensive AI applications now have evidence that computational effort invested in understanding passage relationships yields meaningful performance improvements.
- →Disco-RAG explicitly models discourse structures within and between retrieved passages, improving RAG system performance without fine-tuning.
- →The framework achieves state-of-the-art results on question answering and long-document summarization benchmarks through structural awareness.
- →Intra-chunk discourse trees and inter-chunk rhetorical graphs enable better synthesis of distributed information across documents.
- →The approach demonstrates that classical discourse analysis techniques remain valuable when integrated with modern LLM-based generation.
- →Zero-shot performance gains suggest the method generalizes across tasks and domains, offering practical value for enterprise deployments.