y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#llm-reasoning News & Analysis

118 articles tagged with #llm-reasoning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

118 articles
AIBullisharXiv – CS AI · May 286/10
🧠

Skill-Conditioned Gated Self-Distillation for LLM Reasoning

Researchers propose Skill-Conditioned Gated Self-Distillation (SGSD), a novel method for improving large language model reasoning by leveraging an experience-derived skill bank rather than trusted reference answers. The approach validates skills through a multi-teacher framework and demonstrates consistent improvements over existing methods on mathematical reasoning benchmarks.

AINeutralarXiv – CS AI · May 286/10
🧠

RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs

Researchers present a novel framework analyzing how reinforcement learning (RL) and supervised fine-tuning (SFT) differently shape reasoning in large language models. The study reveals that RL compresses incorrect reasoning paths while SFT expands correct ones, explaining why the two-stage training approach produces superior reasoning capabilities across models of 1.5B to 14B parameters.

AIBullisharXiv – CS AI · May 276/10
🧠

Knowledge Graphs as the Missing Data Layer for LLM-Based Industrial Asset Operations

Researchers demonstrate that knowledge graphs significantly outperform traditional document stores for LLM-based industrial asset operations, achieving 100% accuracy on 467 maintenance scenarios compared to 65% with flat data structures. The study reveals that data architecture, not LLM orchestration design, is the primary performance bottleneck in structured operational domains.

🏢 Hugging Face🧠 GPT-4
AIBullisharXiv – CS AI · May 276/10
🧠

ReasonOps: A Unified Operational Paradigm for Trustworthy Verified LLM Reasoning

Researchers introduce ReasonOps, a unified operational framework that treats AI reasoning as a continuously monitored and verifiable process rather than isolated inference. The paradigm integrates formal verification, symbolic reasoning, and runtime assurance to address critical reliability gaps in LLM-based reasoning systems, particularly for safety-critical applications.

AINeutralarXiv – CS AI · May 276/10
🧠

How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation

Researchers have developed a mechanistic interpretability framework that reverses information flow through Chain-of-Thought prompting to understand how AI models reason. The study reveals CoT functions as a decoding space pruner that uses answer templates to guide outputs, with task-dependent neuron modulation that reduces activation in open-domain tasks but increases it in closed-domain scenarios.

AIBullisharXiv – CS AI · May 276/10
🧠

Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning

Researchers propose PTA-GRPO, a two-stage framework that enhances LLM reasoning by combining high-level planning with reinforcement learning. The method first guides models to summarize reasoning into compact guidance, then uses this guidance to optimize both final outputs and reasoning quality, demonstrating consistent improvements across ten benchmarks.

AINeutralarXiv – CS AI · May 276/10
🧠

Vital Trace: Protocol-Constrained Patient-State Reasoning for Longitudinal Clinical Trajectories

Researchers present Vital Trace, a protocol-constrained multi-agent AI framework designed to improve clinical risk prediction in intensive care units by tracking patient trajectories over extended periods. The system uses compact patient-state memory and structured reasoning agents rather than unbounded text histories, demonstrating better temporal consistency and interpretability on MIMIC-IV and eICU datasets.

AINeutralarXiv – CS AI · May 126/10
🧠

AgentPSO: Evolving Agent Reasoning Skill via Multi-agent Particle Swarm Optimization

Researchers introduce AgentPSO, a framework that evolves multi-agent reasoning skills in large language models using particle swarm optimization principles. Rather than relying on static agents or inference-time debate, the system enables agents to iteratively improve their reasoning capabilities through self-reflection and collective learning, demonstrating improved performance and cross-benchmark transferability without modifying underlying model parameters.

AINeutralarXiv – CS AI · May 126/10
🧠

How You Begin is How You Reason: Driving Exploration in RLVR via Prefix-Tuned Priors

Researchers propose IMAX, a framework that uses trainable prefix tuning to improve exploration in reinforcement learning with verifiable rewards (RLVR) for language model reasoning. The approach addresses entropy collapse by creating diverse reasoning trajectories, achieving performance gains up to 11.60% in Pass@4 accuracy across multiple model scales.

AINeutralarXiv – CS AI · May 126/10
🧠

LLM-Guided Monte Carlo Tree Search over Knowledge Graphs: Composing Mechanistic Explanations for Drug-Disease Pairs

Researchers introduce TESSERA, a neuro-symbolic framework that combines Large Language Models with Monte Carlo Tree Search to extract multi-step explanations from knowledge graphs, specifically for drug-disease mechanism discovery. The system uses LLMs for local judgments rather than autonomous generation, enforcing structural constraints through knowledge graphs while employing MCTS for principled credit assignment across extended reasoning chains.

AINeutralarXiv – CS AI · May 126/10
🧠

Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities

Researchers introduce Absurd World, a benchmarking framework that tests large language models' logical reasoning by creating logically coherent but unrealistic scenarios derived from real-world problems. The framework reveals whether LLMs can reason independently of learned patterns by breaking down real-world models into symbols, actions, sequences, and events, then systematically altering them while preserving underlying logic.

AINeutralarXiv – CS AI · May 126/10
🧠

Verifiable Process Rewards for Agentic Reasoning

Researchers introduce Verifiable Process Rewards (VPR), a framework that enhances reinforcement learning for large language models by providing dense, intermediate-level feedback during reasoning tasks rather than relying solely on sparse outcome-level rewards. The approach leverages symbolic, algorithmic, and probabilistic verification methods to improve credit assignment in long-horizon agentic reasoning, with theoretical and empirical validation across multiple benchmarks.

AINeutralarXiv – CS AI · May 116/10
🧠

When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment

Researchers developed a method to measure when language models stabilize their answer preferences during generation, before explicitly verbalizing a final answer. Using finite-answer projection analysis on the Qwen3-4B-Instruct model, they found answer preferences stabilize 17-31 tokens before the model states its answer, revealing the internal commitment dynamics of LLM reasoning.

AIBullisharXiv – CS AI · May 116/10
🧠

GraphReAct: Reasoning and Acting for Multi-step Graph Inference

GraphReAct introduces a new reasoning-acting framework that enhances large language models for multi-step inference over graph-structured data by combining topological and semantic retrieval actions with context refinement. The framework demonstrates consistent improvements over existing methods across six benchmark datasets, advancing how AI systems can reason about interconnected, structured information.

AINeutralarXiv – CS AI · May 116/10
🧠

Abductive Reasoning with Probabilistic Commonsense

Researchers propose PACS, a probabilistic framework for abductive reasoning that models how commonsense beliefs vary across individuals rather than assuming universal agreement. By combining LLMs with formal solvers to sample diverse proofs and aggregate conclusions, PACS outperforms existing reasoning approaches on multiple benchmarks, addressing a fundamental limitation in neurosymbolic AI systems.

AIBullisharXiv – CS AI · May 116/10
🧠

GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning

Researchers introduce GraphDC, a divide-and-conquer multi-agent framework that enables Large Language Models to solve complex graph algorithms more effectively by decomposing large graphs into smaller subgraphs for specialized agent reasoning. The approach significantly improves LLM performance on graph algorithmic tasks, particularly on larger instances where traditional end-to-end reasoning fails.

AIBullisharXiv – CS AI · May 96/10
🧠

BALAR : A Bayesian Agentic Loop for Active Reasoning

Researchers introduced BALAR, a Bayesian algorithm that enables large language models to engage in structured multi-turn dialogue by actively reasoning about missing information and strategically asking clarifying questions. The system demonstrated significant performance improvements across three diverse benchmarks—14.6% to 38.5% higher accuracy—without requiring fine-tuning, suggesting a more principled approach to interactive AI reasoning.

AINeutralarXiv – CS AI · May 96/10
🧠

Novelty-based Tree-of-Thought Search for LLM Reasoning and Planning

Researchers propose a novelty-based tree-of-thought search method that improves LLM reasoning by measuring the uniqueness of generated thoughts and pruning redundant branches. The approach reduces overall token costs while maintaining performance on reasoning and planning benchmarks, addressing brittleness issues in current advanced LLM techniques.

AINeutralarXiv – CS AI · May 76/10
🧠

How Does Thinking Mode Change LLM Moral Judgments? A Controlled Instant-vs-Thinking Comparison Across Five Frontier Models

Researchers compared moral judgment consistency in five frontier LLMs when using instant versus extended reasoning modes across 100 scenarios. While overall agreement remained statistically similar between modes, reasoning improved cross-model consensus on disputed moral cases and reduced demographic-based inconsistencies, suggesting that explicit reasoning processes may enhance fairness despite not dramatically shifting individual verdicts.

🧠 GPT-5🧠 Claude🧠 Sonnet
AINeutralarXiv – CS AI · May 46/10
🧠

Reasoning-Intensive Regression

Researchers introduce MENTAT, a novel method for reasoning-intensive regression (RiR)—extracting subtle numerical scores from text in specialized domains. The approach combines batch-reflective prompt optimization with neural ensemble learning, achieving up to 65% improvement over standard LLM prompting and fine-tuning approaches on tasks like rubric-based scoring and domain-specific retrieval.

AINeutralarXiv – CS AI · May 16/10
🧠

Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning

Researchers have developed an agentic framework that uses knowledge graphs to help large language models understand and reason about AI policy documents. The system was tested on multiple AI safety regulations, demonstrating that knowledge graph augmentation improves LLM performance across various reasoning tasks from simple entity lookup to complex cross-policy inference.

AINeutralarXiv – CS AI · May 16/10
🧠

FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning

Researchers introduce FinChain, a new benchmark dataset designed to evaluate chain-of-thought reasoning in financial AI systems. The dataset addresses gaps in existing finance benchmarks by emphasizing verifiable intermediate reasoning steps rather than just final answers, and reveals that even leading LLMs struggle with multi-step symbolic financial reasoning.

AINeutralarXiv – CS AI · Apr 206/10
🧠

LLM Reasoning Is Latent, Not the Chain of Thought

A new position paper challenges the prevailing assumption that large language models reason through explicit chain-of-thought outputs, arguing instead that reasoning occurs primarily in latent-state trajectories hidden within model computations. The research separates three confounded factors and proposes that current reasoning benchmarks and interpretability claims need fundamental reevaluation based on this distinction.

AINeutralarXiv – CS AI · Apr 206/10
🧠

Structured Abductive-Deductive-Inductive Reasoning for LLMs via Algebraic Invariants

Researchers propose a symbolic reasoning framework that implements Peirce's abductive-deductive-inductive reasoning model to address systematic weaknesses in large language model logical reasoning. The system enforces logical consistency through five algebraic invariants, with the Weakest Link bound preventing unreliable premises from corrupting multi-step inference chains.

AINeutralarXiv – CS AI · Apr 206/10
🧠

Learning to Reason with Insight for Informal Theorem Proving

Researchers propose DeepInsightTheorem, a framework that teaches large language models to improve informal theorem proving by explicitly extracting and learning core mathematical techniques. The hierarchical dataset combined with a multi-stage training strategy enables LLMs to perform more insightful mathematical reasoning, outperforming existing baseline approaches on challenging benchmarks.

← PrevPage 4 of 5Next →