y0news
AnalyticsDigestsSourcesRSSAICrypto
#scientific-articles1 article
1 articles
AIBearisharXiv โ€“ CS AI ยท 4d ago6/104
๐Ÿง 

Who Gets Cited Most? Benchmarking Long-Context Numerical Reasoning on Scientific Articles

Researchers introduced SciTrek, a new benchmark for testing large language models' ability to perform numerical reasoning across long scientific documents. The benchmark reveals significant challenges for current LLMs, with the best model achieving only 46.5% accuracy at 128K tokens, and performance declining as context length increases.

$COMP