y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#large-reasoning-models News & Analysis

16 articles tagged with #large-reasoning-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

16 articles
AIBullisharXiv – CS AI · Jun 57/10
🧠

Dynamic Thinking-Token Selection for Efficient Reasoning in Large Reasoning Models

Researchers introduce Dynamic Thinking-Token Selection (DynTS), a method that optimizes Large Reasoning Models by identifying and retaining only decision-critical tokens during inference while discarding redundant reasoning trace data. This approach significantly reduces memory footprint and computational overhead, addressing a major efficiency bottleneck in LRMs that generate extended reasoning sequences.

AIBearisharXiv – CS AI · Jun 37/10
🧠

Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models

Researchers demonstrate that Large Reasoning Models (LRMs) frequently 'overthink' problems after reaching correct answers, with continued reasoning degrading accuracy by up to 21%. The study introduces a protocol to measure reasoning sufficiency and reveals that harmful overthinking—where additional reasoning destabilizes correct solutions—represents a broader reliability risk affecting both multimodal and language-only models.

AIBearisharXiv – CS AI · Apr 157/10
🧠

Red Teaming Large Reasoning Models

Researchers introduce RT-LRM, a comprehensive benchmark for evaluating the trustworthiness of Large Reasoning Models across truthfulness, safety, and efficiency dimensions. The study reveals that LRMs face significant vulnerabilities including CoT-hijacking and prompt-induced inefficiencies, demonstrating they are more fragile than traditional LLMs when exposed to reasoning-induced risks.

AIBullisharXiv – CS AI · Apr 67/10
🧠

FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

Researchers discovered that in Large Reasoning Models like DeepSeek-R1, the first solution is often the best, with alternative solutions being detrimental due to error accumulation. They propose RED, a new framework that achieves up to 19% performance gains while reducing token consumption by 37.7-70.4%.

AIBullisharXiv – CS AI · Mar 37/102
🧠

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

Researchers propose Intervened Preference Optimization (IPO) to address safety issues in Large Reasoning Models, where chain-of-thought reasoning contains harmful content even when final responses appear safe. The method achieves over 30% reduction in harmfulness while maintaining reasoning performance.

AIBullisharXiv – CS AI · Mar 37/104
🧠

AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent

Researchers introduced AgentMath, a new AI framework that combines language models with code interpreters to solve complex mathematical problems more efficiently than current Large Reasoning Models. The system achieves state-of-the-art performance on mathematical competition benchmarks, with AgentMath-30B-A3B reaching 90.6% accuracy on AIME24 while remaining competitive with much larger models like OpenAI-o3.

AINeutralarXiv – CS AI · Jun 116/10
🧠

Forecasting Future Behavior as a Learning Task

Researchers propose treating AI behavior forecasting as a learnable task rather than relying on explainability methods, training specialized models to predict how large reasoning models will perform on new inputs. Behavior Forecasters outperform GPT-5.4 and Claude Opus-4.6 at predicting LRM consistency and input-sensitivity while operating at significantly lower inference costs.

🧠 GPT-5🧠 Claude
AIBullisharXiv – CS AI · Jun 86/10
🧠

DyCon: Dynamic Reasoning Control via Evolving Difficulty Modeling

Researchers introduce DyCon, a training-free framework that dynamically models task difficulty during reasoning to reduce inefficiencies in Large Reasoning Models. The method leverages step-level embeddings to control reasoning depth, achieving significant efficiency gains across multiple model sizes and benchmarks without sacrificing accuracy.

AINeutralarXiv – CS AI · Jun 26/10
🧠

LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services

Researchers introduced LocalSearchBench, a comprehensive benchmark for testing AI agents in local life services, revealing significant performance gaps even among state-of-the-art large reasoning models. The benchmark comprises 1.3M merchant entries and 900 multi-hop reasoning tasks, exposing critical weaknesses in completeness and faithfulness that underscore the need for domain-specific AI agent development.

AIBullisharXiv – CS AI · May 116/10
🧠

Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

Researchers compared frontier Large Reasoning Models (LRMs) with traditional AI systems using human gameplay data paired with fMRI brain recordings. LRMs demonstrated superior alignment with human learning behavior and predicted brain activity an order of magnitude better than reinforcement learning alternatives, suggesting they more closely mirror human cognition during complex decision-making.

AINeutralarXiv – CS AI · May 76/10
🧠

ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments

Researchers introduce ReasoningGuard, an inference-time safety mechanism designed to protect Large Reasoning Models from generating harmful content during their reasoning processes. The method uses internal attention mechanisms to inject safety-oriented reflections at critical points, mitigating jailbreak attacks without requiring costly fine-tuning and outperforming nine existing safeguards.

AINeutralarXiv – CS AI · Apr 76/10
🧠

Selective Forgetting for Large Reasoning Models

Researchers propose a new framework for 'selective forgetting' in Large Reasoning Models (LRMs) that can remove sensitive information from AI training data while preserving general reasoning capabilities. The method uses retrieval-augmented generation to identify and replace problematic reasoning segments with benign placeholders, addressing privacy and copyright concerns in AI systems.

AIBearisharXiv – CS AI · Mar 36/104
🧠

HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games

Researchers introduced HardcoreLogic, a benchmark of over 5,000 logic puzzles across 10 games to test Large Reasoning Models (LRMs) on non-standard puzzle variants. The study reveals significant performance drops in current LRMs when faced with complex or uncommon puzzle variations, indicating heavy reliance on memorized patterns rather than genuine logical reasoning.

AIBullisharXiv – CS AI · Feb 276/106
🧠

SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning

Researchers introduce SideQuest, a novel KV cache management system that uses Large Reasoning Models to compress memory usage during long-horizon AI tasks. The system reduces peak token usage by up to 65% while maintaining accuracy by having the model itself determine which tokens are useful to keep in memory.

AIBullisharXiv – CS AI · Feb 276/106
🧠

Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation

Researchers developed a two-stage framework to optimize large reasoning models, reducing overthinking on simple queries while maintaining accuracy on complex problems. The approach achieved up to 3.7 accuracy point improvements while reducing token generation by over 40% through hybrid fine-tuning and adaptive reinforcement learning techniques.