y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#reasoning News & Analysis

169 articles tagged with #reasoning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

169 articles
AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning

Researchers introduce Slow-Fast Policy Optimization (SFPO), a new reinforcement learning framework that improves training stability and efficiency for large language model reasoning. SFPO outperforms existing methods like GRPO by up to 2.80 points on math benchmarks while requiring up to 4.93x fewer rollouts and 4.19x less training time.

AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

GlobalRAG: Enhancing Global Reasoning in Multi-hop Question Answering via Reinforcement Learning

GlobalRAG is a new reinforcement learning framework that significantly improves multi-hop question answering by decomposing questions into subgoals and coordinating retrieval with reasoning. The system achieves 14.2% average improvements in performance metrics while using only 42% of the training data required by baseline models.

AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning

Researchers developed plan conditioning, a training-free method that significantly improves diffusion language model reasoning by prepending short natural-language plans from autoregressive models. The technique improved performance by 11.6 percentage points on math problems and 12.8 points on coding tasks, bringing diffusion models to competitive levels with autoregressive models.

๐Ÿง  Llama
AIBullisharXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

Task-Specific Knowledge Distillation via Intermediate Probes

Researchers introduce a new knowledge distillation framework that improves training of smaller AI models by using intermediate representations from large language models rather than their final outputs. The method shows consistent improvements across reasoning benchmarks, particularly when training data is limited, by providing cleaner supervision signals.

AIBullisharXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

Researchers introduce CLIPO (Contrastive Learning in Policy Optimization), a new method that improves upon Reinforcement Learning with Verifiable Rewards (RLVR) for training Large Language Models. CLIPO addresses hallucination and answer-copying issues by incorporating contrastive learning to better capture correct reasoning patterns across multiple solution paths.

AIBullisharXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models

Researchers propose Dynamics-Predictive Sampling (DPS), a new method that improves reinforcement learning finetuning of large language models by predicting which training prompts will be most informative without expensive computational rollouts. The technique models each prompt's learning progress as a dynamical system and uses Bayesian inference to select better training data, reducing computational overhead while achieving superior reasoning performance.

AIBullisharXiv โ€“ CS AI ยท Mar 116/10
๐Ÿง 

Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning

Researchers introduce Latent-DARM, a framework that bridges discrete diffusion language models and autoregressive models to improve multi-agent AI reasoning capabilities. The system achieved significant improvements on reasoning benchmarks, increasing accuracy from 27% to 36% on DART-5 while using less than 2.2% of the token budget of state-of-the-art models.

AIBearisharXiv โ€“ CS AI ยท Mar 116/10
๐Ÿง 

Common Sense vs. Morality: The Curious Case of Narrative Focus Bias in LLMs

Researchers have identified a critical flaw in Large Language Models (LLMs) where they prioritize moral reasoning over commonsense understanding, struggling to detect logical contradictions within moral dilemmas. The study introduces the CoMoral benchmark and reveals a 'narrative focus bias' where LLMs better identify contradictions attributed to secondary characters rather than primary narrators.

AIBullisharXiv โ€“ CS AI ยท Mar 116/10
๐Ÿง 

RECODE: Reasoning Through Code Generation for Visual Question Answering

Researchers introduce RECODE, a new framework that improves visual reasoning in AI models by converting images into executable code for verification. The system generates multiple candidate programs to reproduce visuals, then selects and refines the most accurate reconstruction, significantly outperforming existing methods on visual reasoning benchmarks.

AIBullisharXiv โ€“ CS AI ยท Mar 96/10
๐Ÿง 

Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check

Researchers introduce Answer-Then-Check, a novel safety alignment approach for large language models that enables them to evaluate response safety before outputting to users. The method uses a new 80K-sample dataset called Reasoned Safety Alignment (ReSA) and demonstrates improved jailbreak defense while maintaining general reasoning capabilities.

๐Ÿข Hugging Face
AINeutralarXiv โ€“ CS AI ยท Mar 96/10
๐Ÿง 

Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!

This position paper argues against anthropomorphizing intermediate tokens generated by language models as 'reasoning traces' or 'thoughts'. The authors contend that treating these computational outputs as human-like thinking processes is misleading and potentially harmful to AI research and understanding.

AINeutralarXiv โ€“ CS AI ยท Mar 66/10
๐Ÿง 

X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

Researchers introduce X-RAY, a new system for analyzing large language model reasoning capabilities through formally verified probes that isolate structural components of reasoning. The study reveals LLMs handle constraint refinement well but struggle with solution-space restructuring, providing contamination-free evaluation methods.

AIBullisharXiv โ€“ CS AI ยท Mar 37/106
๐Ÿง 

Draft-Thinking: Learning Efficient Reasoning in Long Chain-of-Thought LLMs

Researchers propose Draft-Thinking, a new approach to improve the efficiency of large language models' reasoning processes by reducing unnecessary computational overhead. The method achieves an 82.6% reduction in reasoning budget with only a 2.6% performance drop on mathematical problems, addressing the costly overthinking problem in current chain-of-thought reasoning.

AIBullisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

LiTS: A Modular Framework for LLM Tree Search

LiTS is a new modular Python framework that enables LLM reasoning through tree search algorithms like MCTS and BFS. The framework demonstrates reusable components across different domains and reveals that LLM policy diversity, not reward quality, is the key bottleneck for effective tree search in infinite action spaces.

AINeutralarXiv โ€“ CS AI ยท Mar 37/108
๐Ÿง 

DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage

Researchers have developed DIVA-GRPO, a new reinforcement learning method that improves multimodal large language model reasoning by adaptively adjusting problem difficulty distributions. The approach addresses key limitations in existing group relative policy optimization methods, showing superior performance across six reasoning benchmarks.

AINeutralarXiv โ€“ CS AI ยท Mar 36/108
๐Ÿง 

ASTRA-bench: Evaluating Tool-Use Agent Reasoning and Action Planning with Personal User Context

Researchers released ASTRA-bench, a new benchmark for evaluating AI agents' ability to handle complex, multi-step reasoning with personal context and tool usage. Testing revealed that current state-of-the-art models like Claude-4.5-Opus and DeepSeek-V3.2 show significant performance degradation in high-complexity scenarios.

AIBullisharXiv โ€“ CS AI ยท Mar 37/107
๐Ÿง 

Learning Structured Reasoning via Tractable Trajectory Control

Researchers propose Ctrl-R, a new framework that improves large language models' reasoning abilities by systematically discovering and reinforcing diverse reasoning patterns through structured trajectory control. The method enables better exploration of complex reasoning behaviors and shows consistent improvements across mathematical reasoning tasks in both language and vision-language models.

AINeutralarXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

Pencil Puzzle Bench: A Benchmark for Multi-Step Verifiable Reasoning

Researchers introduced Pencil Puzzle Bench, a new framework for evaluating large language model reasoning capabilities using constraint-satisfaction problems. The benchmark tested 51 models across 300 puzzles, revealing significant performance improvements through increased reasoning effort and iterative verification processes.

AIBullisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Researchers propose ActMem, a novel memory framework for LLM agents that combines memory retrieval with active causal reasoning to handle complex decision-making scenarios. The framework transforms dialogue history into structured causal graphs and uses counterfactual reasoning to resolve conflicts between past states and current intentions, significantly outperforming existing baselines in memory-dependent tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 37/108
๐Ÿง 

CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning

Researchers introduce CHIMERA, a compact 9K-sample synthetic dataset that enables smaller AI models to achieve reasoning performance comparable to much larger models. The dataset addresses key challenges in training reasoning-capable LLMs through automated generation and cross-validation across 8 scientific disciplines.

AIBullisharXiv โ€“ CS AI ยท Mar 36/106
๐Ÿง 

One-Token Verification for Reasoning Correctness Estimation

Researchers introduce One-Token Verification (OTV), a new method that estimates reasoning correctness in large language models during a single forward pass, reducing computational overhead. OTV reduces token usage by up to 90% through early termination while improving accuracy on mathematical reasoning tasks compared to existing verification methods.

AIBullisharXiv โ€“ CS AI ยท Mar 36/106
๐Ÿง 

TARSE: Test-Time Adaptation via Retrieval of Skills and Experience for Reasoning Agents

Researchers developed TARSE, a new AI system for clinical decision-making that retrieves relevant medical skills and experiences from curated libraries to improve reasoning accuracy. The system performs test-time adaptation to align language models with clinically valid logic, showing improvements over existing medical AI baselines in question-answering benchmarks.