🧠 AI⚪ NeutralImportance 4/10

SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification

arXiv – CS AI|Rocky Klopfenstein, Yang He, Andrew Tremante, Yuepeng Wang, Nina Narodytska, Haoze Wu|March 5, 2026 at 05:00 AM

🤖AI Summary

Researchers introduce SpotIt, a new evaluation method for Text-to-SQL systems that uses formal verification to find database instances where generated queries differ from ground-truth queries. Testing on the BIRD dataset revealed that current test-based evaluation methods often miss differences between generated and correct SQL queries.

Key Takeaways

→Current Text-to-SQL evaluation methods rely on comparing query results on static test databases, which can miss functionally different queries that coincidentally produce the same output.
→SpotIt uses formal bounded equivalence verification to actively search for databases that expose differences between generated and ground-truth SQL queries.
→Testing ten Text-to-SQL methods on the BIRD dataset showed that test-based evaluation methods frequently overlook query differences.
→The research extends existing verifiers to support a richer SQL subset relevant to Text-to-SQL applications.
→The verification results reveal more complexity in current Text-to-SQL evaluation than previously understood.