AINeutralarXiv β CS AI Β· 7h ago6/10
π§
TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering
Researchers introduce TopBench, a benchmark dataset of 779 samples designed to evaluate how well Large Language Models handle implicit prediction tasks over tabular dataβqueries requiring inference from historical patterns rather than simple data retrieval. Testing reveals current LLMs struggle with intent recognition and default to lookup-based approaches, indicating that accurate intent disambiguation is critical before predictive reasoning can succeed.