y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

Query Symbolically or Retrieve Semantically? A Dataset and Method for Semi-Structured Question Answering

arXiv – CS AI|Mateusz Czy\.znikiewicz, Ryszard Tuora, Adam Kozakiewicz, Tomasz Zi\k{e}tkiewicz, Mateusz Gali\'nski, Micha{\l} Godziszewski, Micha{\l} Karpowicz, Timothy Hospedales, Cristina Cornelio|
🤖AI Summary

Researchers introduce DualGraph, a retrieval-augmented generation framework that combines semantic and symbolic approaches to improve question answering on semi-structured data. The system uses dual knowledge graph representations alongside a new benchmark dataset (SpecsQA) from e-commerce, demonstrating superior performance over existing dense-retrieval and graph-based methods.

Analysis

DualGraph addresses a fundamental limitation in current RAG systems: their reliance on semantic similarity for retrieving evidence works well for unstructured text but fails when queries require exact filtering, numerical aggregation, or structured attribute matching across multiple documents. This gap is particularly acute in e-commerce, financial services, and other domains where semi-structured data dominates. The dual-graph architecture elegantly resolves this by maintaining parallel representations—a textual knowledge graph for semantic understanding and a symbolic knowledge graph for precise logical operations—allowing the system to route queries appropriately or blend evidence types.

The research builds on growing recognition that modern AI systems need hybrid approaches combining neural and symbolic reasoning. While semantic embeddings excel at capturing semantic nuance, symbolic systems handle deterministic operations reliably. Previous attempts leaned too heavily into one direction: pure semantic retrieval struggles with specification-heavy queries, while rule-based systems break easily on natural language variation. The SpecsQA benchmark itself is valuable, providing the community with a standardized evaluation dataset from real commercial product data.

The implications extend beyond academic interest. E-commerce platforms, enterprise search systems, and financial data retrieval tools would all benefit from this hybrid approach. The open-source release signals intent to influence production systems. For developers building RAG applications, DualGraph offers a practical template for handling mixed query types without architecture redesign. The consistent outperformance across baselines suggests this direction represents genuine progress rather than incremental optimization. The framework's modularity means organizations can incrementally adopt symbolic components where they matter most.

Key Takeaways
  • DualGraph combines semantic and symbolic knowledge graphs to overcome limitations in pure semantic retrieval for semi-structured data.
  • The new SpecsQA benchmark provides real e-commerce product data with manually curated questions spanning open-ended and specification-focused queries.
  • Dual-representation systems outperform state-of-the-art dense-retrieval, GraphRAG, and symbolic-only baselines across different question types.
  • Hybrid semantic-symbolic approaches are gaining traction as practical solutions for production RAG systems handling mixed query requirements.
  • Open-source code release enables rapid adoption and validation of the dual-graph methodology across industry applications.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles