y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#reasoning News & Analysis

169 articles tagged with #reasoning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

169 articles
AIBullisharXiv โ€“ CS AI ยท Mar 97/10
๐Ÿง 

Localizing and Correcting Errors for LLM-based Planners

Researchers developed Localized In-Context Learning (L-ICL), a technique that significantly improves large language model performance on symbolic planning tasks by targeting specific constraint violations with minimal corrections. The method achieves 89% valid plan generation compared to 59% for best baselines, representing a major advancement in LLM reasoning capabilities.

AIBullisharXiv โ€“ CS AI ยท Mar 97/10
๐Ÿง 

RM-R1: Reward Modeling as Reasoning

Researchers introduce RM-R1, a new class of Reasoning Reward Models (ReasRMs) that integrate chain-of-thought reasoning into reward modeling for large language models. The models outperform much larger competitors including GPT-4o by up to 4.9% across reward model benchmarks by using a chain-of-rubrics mechanism and two-stage training process.

๐Ÿง  GPT-4๐Ÿง  Llama
AIBullisharXiv โ€“ CS AI ยท Mar 97/10
๐Ÿง 

Mitigating Content Effects on Reasoning in Language Models through Fine-Grained Activation Steering

Researchers have developed a new technique called activation steering to reduce reasoning biases in large language models, particularly the tendency to confuse content plausibility with logical validity. Their novel K-CAST method achieved up to 15% improvement in formal reasoning accuracy while maintaining robustness across different tasks and languages.

AIBullishCrypto Briefing ยท Mar 57/10
๐Ÿง 

OpenAI launches GPT-5.4 with improved reasoning, coding, and computer use capabilities

OpenAI has released GPT-5.4, featuring enhanced reasoning, coding, and computer use capabilities across ChatGPT, API, and Codex platforms. This represents a significant advancement in AI technology that could impact various industries and development workflows.

OpenAI launches GPT-5.4 with improved reasoning, coding, and computer use capabilities
๐Ÿข OpenAI๐Ÿง  GPT-5๐Ÿง  ChatGPT
AIBullisharXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG

Researchers developed MA-RAG, a Multi-Round Agentic RAG framework that improves medical AI reasoning by iteratively refining responses through conflict detection and external evidence retrieval. The system achieved a substantial +6.8 point accuracy improvement over baseline models across 7 medical Q&A benchmarks by addressing hallucinations and outdated knowledge in healthcare AI applications.

AIBullisharXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

Researchers introduce ToolVQA, a large-scale multimodal dataset with 23K instances designed to improve AI models' ability to use external tools for visual question answering. The dataset features real-world contexts and multi-step reasoning tasks, with fine-tuned 7B models outperforming GPT-3.5-turbo on various benchmarks.

AIBullisharXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

TTSR: Test-Time Self-Reflection for Continual Reasoning Improvement

Researchers introduce TTSR, a new framework that enables AI models to improve their reasoning abilities during test time by having a single model alternate between student and teacher roles. The system allows models to learn from their mistakes by analyzing failed reasoning attempts and generating targeted practice questions for continuous improvement.

AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

The Geometry of Reasoning: Flowing Logics in Representation Space

Researchers propose a geometric framework showing how large language models 'think' through representation space as flows, with logical statements acting as controllers of these flows' velocities. The study provides evidence that LLMs can internalize logical invariants through next-token prediction training, challenging the 'stochastic parrot' criticism and suggesting universal representational laws underlying machine understanding.

AIBullisharXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Researchers introduce Structure of Thought (SoT), a new prompting technique that helps large language models better process text by constructing intermediate structures, showing 5.7-8.6% performance improvements. They also release T2S-Bench, the first benchmark with 1.8K samples across 6 scientific domains to evaluate text-to-structure capabilities, revealing significant room for improvement in current AI models.

AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

Researchers introduce Dynamic Pruning Policy Optimization (DPPO), a new framework that accelerates AI language model training by 2.37x while maintaining accuracy. The method addresses computational bottlenecks in Group Relative Policy Optimization through unbiased gradient estimation and improved data efficiency.

AIBullishMicrosoft Research Blog ยท Mar 47/101
๐Ÿง 

Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model

Microsoft Research announces Phi-4-reasoning-vision-15B, a 15 billion parameter open-weight multimodal reasoning model. The model is designed for vision-language tasks including image captioning and is available through Microsoft Foundry, HuggingFace, and GitHub.

AIBearisharXiv โ€“ CS AI ยท Mar 46/103
๐Ÿง 

Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?

New research reveals that current large language models struggle with collaborative reasoning, showing that 'stronger' models are often more fragile when distracted by misleading information. The study of 15 LLMs found they fail to effectively leverage guidance from other models, with success rates below 9.2% on challenging problems.

AIBullisharXiv โ€“ CS AI ยท Mar 47/104
๐Ÿง 

Adaptive Social Learning via Mode Policy Optimization for Language Agents

Researchers propose an Adaptive Social Learning (ASL) framework with Adaptive Mode Policy Optimization (AMPO) algorithm to improve language agents' reasoning abilities in social interactions. The system dynamically adjusts reasoning depth based on context, achieving 15.6% higher performance than GPT-4o while using 32.8% shorter reasoning chains.

AIBullisharXiv โ€“ CS AI ยท Mar 47/103
๐Ÿง 

LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning

Researchers introduce LaDiR (Latent Diffusion Reasoner), a novel framework that combines continuous latent representation with iterative refinement capabilities to enhance Large Language Models' reasoning abilities. The system uses a Variational Autoencoder to encode reasoning steps and a latent diffusion model for parallel generation of diverse reasoning trajectories, showing improved accuracy and interpretability in mathematical reasoning benchmarks.

AIBearisharXiv โ€“ CS AI ยท Mar 46/103
๐Ÿง 

Contextual Drag: How Errors in the Context Affect LLM Reasoning

Researchers have identified 'contextual drag' - a phenomenon where large language models (LLMs) generate similar errors when failed attempts are present in their context. The study found 10-20% performance drops across 11 models on 8 reasoning tasks, with iterative self-refinement potentially leading to self-deterioration.

AIBullisharXiv โ€“ CS AI ยท Mar 46/104
๐Ÿง 

ITLC at SemEval-2026 Task 11: Normalization and Deterministic Parsing for Formal Reasoning in LLMs

Researchers developed a new method to reduce content biases in large language models' reasoning tasks by transforming syllogisms into canonical logical representations with deterministic parsing. The approach achieved top-5 rankings on the multilingual SemEval-2026 Task 11 benchmark while offering a competitive alternative to complex fine-tuning methods.

AIBullisharXiv โ€“ CS AI ยท Mar 46/105
๐Ÿง 

Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO

Researchers developed a three-stage curriculum learning framework that improves Chain-of-Thought reasoning distillation from large language models to smaller ones. The method enables Qwen2.5-3B-Base to achieve 11.29% accuracy improvement while reducing output length by 27.4% through progressive skill acquisition and Group Relative Policy Optimization.

AIBullisharXiv โ€“ CS AI ยท Mar 47/104
๐Ÿง 

PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference

Researchers introduce PRISM, a new AI inference algorithm that uses Process Reward Models to guide deep reasoning systems. The method significantly improves performance on mathematical and scientific benchmarks by treating candidate solutions as particles in an energy landscape and using score-guided refinement to concentrate on higher-quality reasoning paths.

AIBullisharXiv โ€“ CS AI ยท Mar 37/103
๐Ÿง 

On the Reasoning Abilities of Masked Diffusion Language Models

New research demonstrates that Masked Diffusion Models (MDMs) for text generation are computationally equivalent to chain-of-thought augmented transformers in finite-precision settings. The study proves MDMs can solve all reasoning problems that CoT transformers can, while being more efficient for certain problem classes due to parallel generation capabilities.

AINeutralarXiv โ€“ CS AI ยท Mar 37/104
๐Ÿง 

Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

Researchers discovered that large reasoning models (LRMs) suffer from inconsistent answers due to competing mechanisms between Chain-of-Thought reasoning and memory retrieval. They developed FARL, a new fine-tuning framework that suppresses retrieval shortcuts to promote genuine reasoning capabilities in AI models.

AIBullisharXiv โ€“ CS AI ยท Mar 37/103
๐Ÿง 

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Researchers introduce SPIRAL, a self-play reinforcement learning framework that enables language models to develop reasoning capabilities by playing zero-sum games against themselves without human supervision. The system improves performance by up to 10% across 8 reasoning benchmarks on multiple model families including Qwen and Llama.

AIBullisharXiv โ€“ CS AI ยท Mar 37/103
๐Ÿง 

Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning

Researchers introduced Scaf-GRPO, a new training framework that overcomes the 'learning cliff' problem in LLM reasoning by providing strategic hints when models plateau. The method boosted Qwen2.5-Math-7B performance on the AIME24 benchmark by 44.3% relative to baseline GRPO methods.

AINeutralarXiv โ€“ CS AI ยท Mar 37/104
๐Ÿง 

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

Researchers analyzed Mixture-of-Experts (MoE) language models to determine optimal sparsity levels for different tasks. They found that reasoning tasks require balancing active compute (FLOPs) with optimal data-to-parameter ratios, while memorization tasks benefit from more parameters regardless of sparsity.