#llm-reasoning News & Analysis

154 articles tagged with #llm-reasoning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

154 articles

AINeutralarXiv – CS AI · Jun 116/10

🧠

Resource-Aware LLM Reasoning for Mobile Edge General Intelligence

Researchers propose a joint optimization framework for deploying large language model reasoning on resource-constrained edge devices, combining adaptive chain-of-thought prompting with distributed mixture-of-experts architecture. The framework dynamically balances reasoning quality and computational efficiency by treating reasoning depth as an optimizable network resource, achieving 90% accuracy and latency satisfaction with minimal inference overhead.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

Researchers introduce Stepwise Confidence Attribution (SCA), a framework for diagnosing where large language models fail in multi-step reasoning tasks without requiring access to the model's internal parameters. The method identifies problematic reasoning steps by measuring confidence alignment with consensus patterns across correct solutions, improving self-correction accuracy by up to 13.5%.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning

Researchers propose AGCLR, a new method that enhances large language models' reasoning capabilities by introducing persistent memory across reasoning steps. The approach addresses a fundamental limitation in continuous latent reasoning where intermediate facts are lost as models explore deeper reasoning paths, demonstrating consistent improvements on mathematical and multi-hop reasoning benchmarks.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Momentum for Reasoning: Dense Intrinsic Signals in Policy Optimization

Researchers introduce ISPO (Intrinsic Signal Policy Optimization), a new reinforcement learning method that improves long-chain reasoning in large language models by densifying reward signals with intrinsic metrics derived from the model's own probabilities. The approach addresses critical failure modes in existing GRPO-based methods and shows consistent improvements across mathematical reasoning benchmarks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

AeroSpectra Sentinel: An Auditable LLM Prompt-Chaining Decision-Support Workflow for Acute Asthma Risk Assessment from Respiratory Sounds and Clinical Signals

AeroSpectra Sentinel is a research prototype that combines STFT audio analysis, machine learning, and LLM prompt-chaining to assist in acute asthma risk assessment from respiratory sounds and clinical signals. Evaluated on respiratory sound datasets, the system achieved up to 91.10% binary accuracy with random forest models, while structured prompting with guardrails and FHIR validation showed strongest safety consistency in simulated clinical scenarios.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Knowledge Graphs and Reasoning LLMs for Finding Simple Yet Effective Transcriptomic Perturbation Predictors

Researchers demonstrate that simple K-nearest neighbor models leveraging biological knowledge graphs achieve competitive performance in predicting gene knockout effects on transcriptomic expression, with reinforcement learning-optimized LLMs further improving results to match state-of-the-art methods. This work suggests knowledge graphs serve as effective model priors for complex biological prediction tasks.

AIBullisharXiv – CS AI · Jun 96/10

🧠

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning

Researchers introduce CLPO, a curriculum learning framework that dynamically adapts training difficulty for large language models during reinforcement learning. The approach automatically identifies solved, medium, and hard problems, then strategically restructures tasks to match the model's evolving capabilities, achieving substantial improvements over existing methods on mathematical and reasoning benchmarks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

MatSciBench: Benchmarking the Reasoning Ability of Large Language Models in Materials Science

Researchers introduced MatSciBench, a comprehensive benchmark of 1,340 college-level materials science problems designed to evaluate large language models' reasoning abilities in this specialized domain. Testing leading LLMs revealed significant limitations, with DeepSeek-R1 achieving 75.22% accuracy on text questions and GPT-4 reaching 53.02% on multimodal tasks, highlighting gaps in domain knowledge, calculation accuracy, and scientific figure interpretation.

🧠 GPT-5

AINeutralarXiv – CS AI · Jun 86/10

🧠

Quantum-Inspired Trace-Augmented Evidence Selection for Reasoning over Structured Hypothesis Spaces

Researchers propose EP-HUBO, a quantum-inspired optimization method that improves how large language models aggregate reasoning chains for evidence-intensive tasks like legal reasoning. By treating evidence selection as a combinatorial optimization problem rather than using simple majority voting, the approach preserves accurate minority hypotheses and achieves better performance on legal benchmarks.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Beyond Vector Similarity: A Structural Analysis of Graph-Augmented Retrieval for Industrial Knowledge Graphs

Researchers demonstrate that vector-based retrieval systems fail on queries requiring structural reasoning over knowledge graphs, proposing instead an LLM Query Planner with typed traversal primitives that outperforms traditional approaches. The study reveals that LLM capability gaps in graph reasoning stem not from model intelligence but from insufficient computational operators, with implications for enterprise knowledge systems.

AINeutralarXiv – CS AI · Jun 56/10

🧠

ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces

ReasoningFlow is a framework that maps the complex, non-linear reasoning traces of large reasoning models into directed acyclic graphs, enabling better understanding and monitoring of AI reasoning processes. Through analysis of 1,260 traces across multiple models and tasks, researchers discovered that LRMs exhibit structurally similar reasoning patterns despite different training origins, while most erroneous steps don't influence final answers.

AIBullisharXiv – CS AI · Jun 56/10

🧠

InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning

Researchers propose InfoDensity, a reinforcement learning reward framework that optimizes Large Language Models for efficient reasoning by measuring information density rather than just output length. The method tracks entropy trajectories to identify high-quality intermediate reasoning steps, achieving better accuracy-efficiency trade-offs on mathematical and general reasoning benchmarks.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal

Researchers propose a framework for multi-agent systems that treats disagreement as valuable information rather than error to be eliminated. The approach abstracts reasoning traces into four symbolic disagreement states and applies strategic routing rules to content moderation and AI collaboration tasks.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making

Researchers introduce MechSim, a neuro-symbolic framework that enables large language models to reason transparently about the assumptions and mechanisms underlying scientific simulators. The approach improves explainability and decision-making reliability in high-stakes simulation-driven applications by treating simulators as structured systems rather than black boxes.

AINeutralarXiv – CS AI · Jun 46/10

🧠

DAR: Deontic Reasoning with Agentic Harnesses

Researchers introduce Deontic Agentic Reasoning (DAR), a new framework that enables large language models to better tackle complex rule-based reasoning tasks by dynamically querying statutes and policies. Testing on DeonticBench shows agentic approaches improve performance on hard cases, though weaker models struggle with numerical reasoning and consume significantly more tokens.

AINeutralarXiv – CS AI · Jun 36/10

🧠

Visual Graph Scaffolds for Structural Reasoning in Large Language Models

Researchers demonstrate that visual graph structures serve as more effective reasoning scaffolds for large language models than text-based representations, particularly when abstract guidance is provided without direct answer hints. The findings suggest graphs should be leveraged not merely as external knowledge sources but as internal organizational tools that meaningfully improve both reasoning efficiency and answer quality in multi-hop question-answering tasks.