y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#llm-reasoning News & Analysis

118 articles tagged with #llm-reasoning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

118 articles
AINeutralarXiv – CS AI · Apr 206/10
🧠

DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

Researchers introduce DPrivBench, a benchmark for evaluating how well large language models can reason about differential privacy algorithms and verify their correctness. Testing shows current LLMs handle basic DP mechanisms competently but fail significantly on advanced algorithms, exposing critical gaps in automated privacy reasoning capabilities.

AINeutralarXiv – CS AI · Apr 206/10
🧠

Revisiting the Uniform Information Density Hypothesis in LLM Reasoning

Researchers challenge the Uniform Information Density hypothesis in LLM reasoning, finding that high-quality reasoning exhibits locally smooth but globally non-uniform information flow. This counter-intuitive pattern suggests LLMs optimize differently than human communication, with entropy-based metrics effectively predicting reasoning quality across seven benchmarks.

AIBullisharXiv – CS AI · Apr 156/10
🧠

Heuristic Classification of Thoughts Prompting (HCoT): Integrating Expert System Heuristics for Structured Reasoning into Large Language Models

Researchers propose Heuristic Classification of Thoughts (HCoT), a novel prompting method that integrates expert system heuristics into large language models to improve structured reasoning on complex problems. The approach addresses LLMs' stochastic token generation and decoupled reasoning mechanisms by using heuristic classification to guide and optimize decision trajectories, demonstrating superior performance and token efficiency compared to existing methods like Chain-of-Thoughts and Tree-of-Thoughts prompting.

AIBullisharXiv – CS AI · Apr 156/10
🧠

KG-Reasoner: A Reinforced Model for End-to-End Multi-Hop Knowledge Graph Reasoning

Researchers introduce KG-Reasoner, an end-to-end framework that uses reinforcement learning to train large language models to perform multi-hop reasoning over knowledge graphs without decomposing tasks into isolated pipeline steps. The approach demonstrates competitive or superior performance across eight reasoning benchmarks by enabling LLMs to dynamically explore reasoning paths and backtrack when necessary.

AINeutralarXiv – CS AI · Apr 156/10
🧠

Topology-Aware Reasoning over Incomplete Knowledge Graph with Graph-Based Soft Prompting

Researchers propose a graph-based soft prompting framework that enables LLMs to reason over incomplete knowledge graphs by processing subgraph structures rather than explicit node paths, achieving state-of-the-art results on multi-hop question-answering benchmarks while reducing computational costs through a two-stage inference approach.

AINeutralarXiv – CS AI · Apr 146/10
🧠

Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method

Researchers introduce ConflictQA, a benchmark revealing that large language models struggle with conflicting information across different knowledge sources (text vs. knowledge graphs) in retrieval-augmented generation systems. The study proposes XoT, an explanation-based framework to improve faithful reasoning when LLMs encounter contradictory evidence.

AIBullisharXiv – CS AI · Apr 146/10
🧠

Interactive Learning for LLM Reasoning

Researchers introduce ILR, a novel multi-agent learning framework that enables Large Language Models to enhance their independent reasoning through interactive training with other LLMs, then solve problems autonomously without re-executing the multi-agent system. The approach combines dynamic interaction strategies and perception calibration, delivering up to 5% performance improvements across mathematical, coding, and reasoning benchmarks.

AIBullisharXiv – CS AI · Apr 146/10
🧠

Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

Researchers introduce PODS (Policy Optimization with Down-Sampling), a technique that accelerates reinforcement learning training for large language models by selectively training on high-variance rollouts rather than all generated data. The method achieves equivalent performance to standard approaches at 1.7x faster speeds, addressing computational bottlenecks in LLM reasoning optimization.

AINeutralarXiv – CS AI · Apr 146/10
🧠

TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning

Researchers propose TokUR, a framework that enables large language models to estimate uncertainty at the token level during reasoning tasks, allowing LLMs to self-assess response quality and improve performance on mathematical problems. The approach uses low-rank random weight perturbation to generate predictive distributions, demonstrating strong correlation with answer correctness and potential for enhancing LLM reliability.

AINeutralarXiv – CS AI · Apr 146/10
🧠

StyleBench: Evaluating thinking styles in Large Language Models

StyleBench is a new benchmark that evaluates how different reasoning structures (Chain-of-Thought, Tree-of-Thought, etc.) affect LLM performance across various tasks and model sizes. The research reveals that structural complexity only improves accuracy in specific scenarios, with simpler approaches often proving more efficient, and that learning adaptive reasoning strategies is itself a complex problem requiring advanced training methods.

AINeutralarXiv – CS AI · Apr 146/10
🧠

MERMAID: Memory-Enhanced Retrieval and Reasoning with Multi-Agent Iterative Knowledge Grounding for Veracity Assessment

Researchers introduce MERMAID, a memory-enhanced multi-agent framework for automated fact-checking that couples evidence retrieval with reasoning processes. The system achieves state-of-the-art performance on multiple benchmarks by reusing retrieved evidence across claims, reducing redundant searches and improving verification efficiency.

AINeutralarXiv – CS AI · Apr 106/10
🧠

ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning

Researchers present ProofSketcher, a hybrid system combining large language models with lightweight proof verification to address mathematical reasoning errors in AI-generated proofs. The approach bridges the gap between LLM efficiency and the formal rigor of interactive theorem provers like Lean and Coq, enabling more reliable automated reasoning without requiring full formalization.

$AVAX
AINeutralarXiv – CS AI · Apr 106/10
🧠

Consistency-Guided Decoding with Proof-Driven Disambiguation for Three-Way Logical Question Answering

Researchers present CGD-PD, a test-time decoding method that improves large language models' performance on three-way logical question answering (True/False/Unknown) by enforcing negation consistency and resolving epistemic uncertainty through targeted entailment probes. The approach achieves up to 16% relative accuracy improvements on the FOLIO benchmark while reducing spurious Unknown predictions.

AIBullisharXiv – CS AI · Apr 106/10
🧠

$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models

Researchers introduce S³ (Stratified Scaling Search), a test-time scaling method for diffusion language models that improves output quality by reallocating compute during the denoising process rather than simple best-of-K sampling. The technique uses a lightweight verifier to evaluate and selectively resample candidate trajectories at each step, demonstrating consistent performance gains across mathematical reasoning and knowledge tasks without requiring model retraining.

AINeutralarXiv – CS AI · Apr 106/10
🧠

The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning

Researchers discovered that large language models have a fundamental limitation in latent reasoning: they can discover multi-step planning strategies without explicit supervision, but only up to depths of 3-7 steps depending on model size and training method. This finding suggests that complex reasoning tasks may require explicit chain-of-thought monitoring rather than relying on hidden internal computations.

🧠 GPT-4🧠 GPT-5
AINeutralarXiv – CS AI · Apr 66/10
🧠

Trivial Vocabulary Bans Improve LLM Reasoning More Than Deep Linguistic Constraints

A replication study found that simple vocabulary constraints like banning filler words ('very', 'just') improved AI reasoning performance more than complex linguistic restrictions like E-Prime. The research suggests any constraint that disrupts default generation patterns acts as an output regularizer, with shallow constraints being most effective.

AIBullisharXiv – CS AI · Mar 176/10
🧠

Distilling Reasoning Without Knowledge: A Framework for Reliable LLMs

Researchers propose a new framework for large language models that separates planning from factual retrieval to improve reliability in fact-seeking question answering. The modular approach uses a lightweight student planner trained via teacher-student learning to generate structured reasoning steps, showing improved accuracy and speed on challenging benchmarks.

AINeutralarXiv – CS AI · Mar 35/104
🧠

Learning Global Hypothesis Space for Enhancing Synergistic Reasoning Chain

Researchers propose GHS-TDA, a new method to improve large language model reasoning by using global hypothesis graphs and topological data analysis. The approach addresses limitations in Chain-of-Thought reasoning by providing error correction mechanisms and filtering redundant reasoning paths.

← PrevPage 5 of 5