AINeutralarXiv β CS AI Β· 7h ago6/10
π§
Beyond Facts: Benchmarking Distributional Reading Comprehension in Large Language Models
Researchers introduce Text2DistBench, a new benchmark for evaluating how well large language models understand distributional informationβlike trends and preferences across text collectionsβrather than just factual details. Built from YouTube comments about movies and music, the benchmark reveals that while LLMs outperform random baselines, their performance varies significantly across different distribution types, highlighting both capabilities and gaps in current AI systems.