EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL
EviLink is a new AI framework that improves Text-to-SQL systems by treating schema linking as an uncertainty-aware process across multiple SQL paths rather than a single deterministic selection. The approach balances schema completeness, relevance, and computational cost, achieving 90.15% field-level recall on Spider2-Snow while using fewer tokens than existing methods.
EviLink addresses a fundamental challenge in converting natural language queries to SQL on large databases: identifying which database schema elements are actually needed to answer a question. Traditional approaches force systems into binary decisions about which tables and fields to include, often resulting in either incomplete schemas that miss necessary information or bloated schemas that waste computational resources. This research reframes the problem by acknowledging that complex natural language questions may have multiple valid SQL interpretations, each with different schema requirements.
The technical innovation lies in combining multi-hypothesis schema grounding with uncertainty-guided evidence acquisition. Rather than committing to a single SQL path early, EviLink explores multiple plausible interpretations simultaneously, distinguishing between schema items that are universally required and those whose necessity depends on which SQL path the system ultimately chooses. This probabilistic approach enables the system to acquire evidence only where genuinely needed, avoiding unnecessary context that would consume token budget and processing power.
For the broader AI and database community, this work represents a meaningful step toward more efficient and scalable Text-to-SQL systems. The experimental results on Spider2-Snow demonstrate practical improvements: achieving 90.15% field-level recall while maintaining reasonable token usage suggests the approach scales better than existing alternatives. This efficiency matters significantly for production systems handling large enterprise databases with thousands of tables and fields, where comprehensive schema inclusion becomes prohibitively expensive.
Looking forward, the uncertainty-aware paradigm could influence how other database query systems approach schema selection and context management. Subsequent research may explore whether similar principles apply to related problems like code generation or multi-database federation.
- βEviLink reframes schema linking as an uncertainty-aware process over multiple SQL paths rather than deterministic single-path selection
- βThe system achieves 90.15% field-level recall on Spider2-Snow while using only 123.30K average tokens, improving efficiency over existing methods
- βMulti-hypothesis grounding distinguishes between universally required schema items and path-dependent uncertain ones
- βThis approach directly addresses the computational cost challenge of large-scale Text-to-SQL systems on enterprise databases
- βThe research demonstrates measurable improvements in downstream SQL generation performance under fixed generator constraints