OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources
OmniRetrieval is a new framework that enables unified retrieval across heterogeneous knowledge sources—including unstructured text, relational databases, knowledge graphs, and property graphs—by translating natural language queries into source-native queries rather than forcing all data into a homogenized format. The system demonstrates superior performance compared to single-source retrievers across 13 datasets and 309 knowledge bases, positioning it as a general-purpose interface that preserves the structural advantages of each knowledge source.
OmniRetrieval addresses a fundamental challenge in information retrieval: the fragmentation of knowledge across incompatible data structures and query languages. Rather than attempting to normalize diverse sources into a single representation—a common but limiting approach—the framework operates as a intelligent router that preserves each source's native format and query capabilities while providing a unified natural language interface. This architectural choice is significant because structural information in databases, ontologies in knowledge graphs, and relational schemas in tables provide genuine computational advantages that homogenization would destroy.
The research builds on decades of work in federated databases and semantic web technologies, but applies modern natural language processing to automate source selection and query translation. The benchmark results are substantial: testing across 13 datasets spanning text, relational, and graph-structured sources shows OmniRetrieval outperforms baselines that specialize in individual source types. This suggests the framework successfully routes queries to optimal sources and preserves the expressive power of each.
For enterprise knowledge management and AI systems, this has meaningful implications. Organizations increasingly maintain data across heterogeneous platforms—document stores, SQL databases, graph systems—and struggle to enable unified access without expensive custom integration layers. OmniRetrieval could reduce this friction significantly. For AI developers, particularly those building retrieval-augmented generation (RAG) systems, the framework offers a technical solution to the problem of context diversity, allowing models to access richer, more structured information while maintaining query precision.
The work's long-term impact depends on adoption and generalization to emerging data sources and query patterns. Key metrics to track include performance on domain-specific knowledge bases, scalability across enterprise systems, and integration with modern LLM-based applications.
- →OmniRetrieval enables unified natural language queries across text, relational databases, and knowledge graphs without forcing homogenization of diverse data structures.
- →The framework outperforms single-source baseline retrievers across 13 datasets with 309 knowledge bases by preserving native query capabilities and structural affordances.
- →The architecture routes natural language queries to appropriate knowledge sources and dispatches source-native queries to their execution engines.
- →This approach is particularly valuable for enterprise systems and RAG applications that need to integrate multiple heterogeneous data platforms.
- →The research demonstrates that effective multi-source retrieval requires domain-aware routing rather than collapsing all sources into unified representations.