#large-reasoning-models News & Analysis

10 articles tagged with #large-reasoning-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles

AIBearisharXiv – CS AI · Apr 157/10

🧠

Red Teaming Large Reasoning Models

Researchers introduce RT-LRM, a comprehensive benchmark for evaluating the trustworthiness of Large Reasoning Models across truthfulness, safety, and efficiency dimensions. The study reveals that LRMs face significant vulnerabilities including CoT-hijacking and prompt-induced inefficiencies, demonstrating they are more fragile than traditional LLMs when exposed to reasoning-induced risks.

AIBullisharXiv – CS AI · Apr 67/10

🧠

FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

Researchers discovered that in Large Reasoning Models like DeepSeek-R1, the first solution is often the best, with alternative solutions being detrimental due to error accumulation. They propose RED, a new framework that achieves up to 19% performance gains while reducing token consumption by 37.7-70.4%.

AIBullisharXiv – CS AI · Mar 37/102

🧠

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

Researchers propose Intervened Preference Optimization (IPO) to address safety issues in Large Reasoning Models, where chain-of-thought reasoning contains harmful content even when final responses appear safe. The method achieves over 30% reduction in harmfulness while maintaining reasoning performance.

AIBullisharXiv – CS AI · Mar 37/104

🧠

AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent

Researchers introduced AgentMath, a new AI framework that combines language models with code interpreters to solve complex mathematical problems more efficiently than current Large Reasoning Models. The system achieves state-of-the-art performance on mathematical competition benchmarks, with AgentMath-30B-A3B reaching 90.6% accuracy on AIME24 while remaining competitive with much larger models like OpenAI-o3.

AINeutralarXiv – CS AI · 7h ago6/10

🧠

ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments

Researchers introduce ReasoningGuard, an inference-time safety mechanism designed to protect Large Reasoning Models from generating harmful content during their reasoning processes. The method uses internal attention mechanisms to inject safety-oriented reflections at critical points, mitigating jailbreak attacks without requiring costly fine-tuning and outperforming nine existing safeguards.

AINeutralarXiv – CS AI · Apr 76/10

🧠

Selective Forgetting for Large Reasoning Models

Researchers propose a new framework for 'selective forgetting' in Large Reasoning Models (LRMs) that can remove sensitive information from AI training data while preserving general reasoning capabilities. The method uses retrieval-augmented generation to identify and replace problematic reasoning segments with benign placeholders, addressing privacy and copyright concerns in AI systems.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Stop Before You Fail: Operational Capability Boundaries for Mitigating Unproductive Reasoning in Large Reasoning Models

Researchers developed monitoring strategies to detect when Large Reasoning Models are engaging in unproductive reasoning by identifying early failure signals. The new techniques reduce token usage by 62.7-93.6% while maintaining accuracy, significantly improving AI model efficiency.

AIBearisharXiv – CS AI · Mar 36/104

🧠

HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games

Researchers introduced HardcoreLogic, a benchmark of over 5,000 logic puzzles across 10 games to test Large Reasoning Models (LRMs) on non-standard puzzle variants. The study reveals significant performance drops in current LRMs when faced with complex or uncommon puzzle variations, indicating heavy reliance on memorized patterns rather than genuine logical reasoning.

AIBullisharXiv – CS AI · Feb 276/106

🧠

SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning

Researchers introduce SideQuest, a novel KV cache management system that uses Large Reasoning Models to compress memory usage during long-horizon AI tasks. The system reduces peak token usage by up to 65% while maintaining accuracy by having the model itself determine which tokens are useful to keep in memory.

AIBullisharXiv – CS AI · Feb 276/106

🧠

Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation

Researchers developed a two-stage framework to optimize large reasoning models, reducing overthinking on simple queries while maintaining accuracy on complex problems. The approach achieved up to 3.7 accuracy point improvements while reducing token generation by over 40% through hybrid fine-tuning and adaptive reinforcement learning techniques.