y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables

arXiv – CS AI|Sungho Park, Jueun Kim, Wook-Shin Han||7 views
🤖AI Summary

Researchers introduce SPARTA, an automated framework for generating large-scale Table-Text question answering benchmarks that require complex multi-hop reasoning across structured and unstructured data. The benchmark exposes significant weaknesses in current AI models, with state-of-the-art systems experiencing over 30 F1 point performance drops compared to existing simpler datasets.

Key Takeaways
  • SPARTA automatically generates complex Table-Text QA benchmarks with only 25% of the manual annotation time required by previous methods like HybridQA.
  • The framework creates questions requiring deep multi-hop reasoning, aggregations, and grouping operations that better reflect real-world analytical tasks.
  • State-of-the-art AI models show dramatic performance degradation on SPARTA, dropping over 30 F1 points from their performance on simpler benchmarks.
  • Two novel techniques - provenance-based refinement and realistic-structure enforcement - ensure generated questions are executable and human-sounding.
  • The benchmark and construction code are open-sourced, providing researchers with tools to evaluate cross-modal reasoning capabilities more rigorously.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles