y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

Executable Schema Contracts: From Automatic Ingestion to Multi-Source Retrieval

arXiv – CS AI|Padmaja Jonnalagedda, Yuguang Yao, Xiang Gao, Hilaf Hasson, Kamalika Das|
🤖AI Summary

Researchers present an automated system that discovers executable schemas from multi-source, heterogeneous data and uses them as a unified contract for knowledge graph construction and intelligent query routing. The approach combines LLM-based schema discovery with deterministic structural analysis and demonstrates improved retrieval performance across four QA benchmarks compared to baseline methods.

Analysis

This research addresses a fundamental challenge in data integration: unifying information across tables, documents, and semi-structured sources with conflicting schemas and formats. Traditional approaches either require expensive manual schema engineering or abandon structure altogether, limiting the quality of downstream retrieval and reasoning tasks. The proposed system automates schema discovery through a constrained field catalog that prevents LLM hallucinations, then applies deterministic analysis to identify keys and hierarchies—creating a semantic contract that governs how data flows through the pipeline.

The innovation extends beyond discovery into practical application. At query time, the schema conditions a multi-tool agent that intelligently routes requests across structured lookup, graph traversal, and vector search, synthesizing results with traceable provenance. This is particularly valuable for knowledge graph construction, where structural intelligence dramatically improves deduplication and entity linking across heterogeneous sources. The ablation studies demonstrate that each component—schema-conditioned routing, structural analysis, and schema-guided construction—independently contributes to performance gains, suggesting the approach is robust rather than reliant on any single technique.

For organizations managing real-world data lakes, this work suggests that automated schema discovery could reduce integration costs while improving retrieval quality. The system's applicability across multiple QA benchmarks indicates generalizability beyond narrow use cases. The emphasis on deterministic analysis alongside LLM capabilities offers a pragmatic middle ground between full automation and manual curation, potentially making enterprise data integration more scalable and maintainable.

Key Takeaways
  • Automated schema discovery from multi-source data creates a unified contract for knowledge graph construction and intelligent query routing.
  • Combining LLM-based discovery with deterministic structural analysis prevents hallucinations and infers critical database relationships automatically.
  • Schema-conditioned routing at query time outperforms retrieval-only and decomposition-based baselines across four QA benchmarks.
  • The system maintains provenance awareness, enabling traceable citations and grounded answers in cross-source retrieval.
  • Ablation studies confirm that schema-conditioned routing, structural intelligence, and schema-guided construction each independently improve performance.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles