🧠 AI🔴 BearishImportance 6/10

Who Gets Cited Most? Benchmarking Long-Context Numerical Reasoning on Scientific Articles

arXiv – CS AI|Miao Li, Alexander Gurung, Irina Saparina, Mirella Lapata|March 3, 2026 at 05:00 AM|4 views

🤖AI Summary

Researchers introduced SciTrek, a new benchmark for testing large language models' ability to perform numerical reasoning across long scientific documents. The benchmark reveals significant challenges for current LLMs, with the best model achieving only 46.5% accuracy at 128K tokens, and performance declining as context length increases.

Key Takeaways

→SciTrek benchmark tests LLMs on counting, sorting, and comparing information across multiple full-text scientific articles.
→Even the best-performing LLM achieved only 46.5% exact match accuracy at 128K token contexts.
→Model performance degrades as context length increases, highlighting limitations in long-context reasoning.
→LLMs particularly struggle with citation-related questions and compound logical conditions including negation.
→The benchmark uses SQL queries over article metadata to generate verifiable questions with ground-truth answers.

Mentioned Tokens

$COMP$0.0000▲+0.0%

Let AI manage these →

Non-custodial · Your keys, always

#llm #benchmark #long-context #numerical-reasoning #scientific-articles #performance-evaluation #ai-limitations #context-length #citation-analysis

Read Original →via arXiv – CS AI

Act on this with AI

This article mentions $COMP.

Let your AI agent check your portfolio, get quotes, and propose trades — you review and approve from your device.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

Who Gets Cited Most? Benchmarking Long-Context Numerical Reasoning on Scientific Articles

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge