#numerical-reasoning News & Analysis

6 articles tagged with #numerical-reasoning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles

AIBullisharXiv – CS AI · Jun 117/10

🧠

MoCA-Agent: A Market-of-Claims Code Agent for Financial and Numerical Reasoning

Researchers introduced MoCA-Agent, a novel AI system that improves financial and numerical reasoning by decomposing questions into atomic claims verified through a market-based mechanism rather than free-form debate. The system achieved strong performance across ten benchmarks, including 78.3% on FinQA and 86.9% on ESGenius, demonstrating that claim-level verification enhances accuracy in high-stakes numerical reasoning tasks.

AIBullisharXiv – CS AI · Jun 17/10

🧠

Fighting Numerical Hallucinations via Data-centric Compilation for Online Financial QA

Researchers propose DCRC, a data-centric framework addressing numerical hallucinations in LLM-based financial question-answering systems. The approach combines adversarial data construction, multi-stage training, and executable reasoning programs to improve reliability in high-stakes financial applications where accuracy is critical.

AIBullisharXiv – CS AI · Mar 117/10

🧠

Reinforcing Numerical Reasoning in LLMs for Tabular Prediction via Structural Priors

Researchers propose PRPO (Permutation Relative Policy Optimization), a reinforcement learning framework that enhances large language models' numerical reasoning capabilities for tabular data prediction. The method achieves performance comparable to supervised baselines while excelling in zero-shot scenarios, with an 8B parameter model outperforming much larger models by up to 53.17%.

AIBullisharXiv – CS AI · Mar 67/10

🧠

CONE: Embeddings for Complex Numerical Data Preserving Unit and Variable Semantics

Researchers introduce CONE, a hybrid transformer encoder model that improves numerical reasoning in AI by creating embeddings that preserve the semantics of numbers, ranges, and units. The model achieves 87.28% F1 score on DROP dataset, representing a 9.37% improvement over existing state-of-the-art models across web, medical, finance, and government domains.

AINeutralarXiv – CS AI · May 46/10

🧠

Reasoning-Intensive Regression

Researchers introduce MENTAT, a novel method for reasoning-intensive regression (RiR)—extracting subtle numerical scores from text in specialized domains. The approach combines batch-reflective prompt optimization with neural ensemble learning, achieving up to 65% improvement over standard LLM prompting and fine-tuning approaches on tasks like rubric-based scoring and domain-specific retrieval.

AIBearisharXiv – CS AI · Mar 36/104

🧠

Who Gets Cited Most? Benchmarking Long-Context Numerical Reasoning on Scientific Articles

Researchers introduced SciTrek, a new benchmark for testing large language models' ability to perform numerical reasoning across long scientific documents. The benchmark reveals significant challenges for current LLMs, with the best model achieving only 46.5% accuracy at 128K tokens, and performance declining as context length increases.

$COMP