🧠 AI⚪ NeutralImportance 6/10

SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables

arXiv – CS AI|Sungho Park, Jueun Kim, Wook-Shin Han|February 27, 2026 at 05:00 AM|7 views

🤖AI Summary

Researchers introduce SPARTA, an automated framework for generating large-scale Table-Text question answering benchmarks that require complex multi-hop reasoning across structured and unstructured data. The benchmark exposes significant weaknesses in current AI models, with state-of-the-art systems experiencing over 30 F1 point performance drops compared to existing simpler datasets.

Key Takeaways

→SPARTA automatically generates complex Table-Text QA benchmarks with only 25% of the manual annotation time required by previous methods like HybridQA.
→The framework creates questions requiring deep multi-hop reasoning, aggregations, and grouping operations that better reflect real-world analytical tasks.
→State-of-the-art AI models show dramatic performance degradation on SPARTA, dropping over 30 F1 points from their performance on simpler benchmarks.
→Two novel techniques - provenance-based refinement and realistic-structure enforcement - ensure generated questions are executable and human-sounding.
→The benchmark and construction code are open-sourced, providing researchers with tools to evaluate cross-modal reasoning capabilities more rigorously.

#ai-benchmarks #question-answering #table-text-qa #multi-hop-reasoning #machine-learning #natural-language-processing #sparta #cross-modal #automated-generation #performance-evaluation

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge