y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#reasoning-optimization News & Analysis

4 articles tagged with #reasoning-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AIBullisharXiv – CS AI · May 117/10
🧠

The Context Gathering Decision Process: A POMDP Framework for Agentic Search

Researchers introduce the Context Gathering Decision Process (CGDP), a POMDP framework that formalizes how LLM agents should search and gather information from environments exceeding their context windows. The approach yields measurable improvements in multi-hop reasoning (up to 11.4%) and token efficiency (up to 39% savings) through explicit belief state management and programmatic exhaustion detection.

AIBullisharXiv – CS AI · Apr 67/10
🧠

FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

Researchers discovered that in Large Reasoning Models like DeepSeek-R1, the first solution is often the best, with alternative solutions being detrimental due to error accumulation. They propose RED, a new framework that achieves up to 19% performance gains while reducing token consumption by 37.7-70.4%.

AINeutralarXiv – CS AI · May 116/10
🧠

Direct Reasoning Optimization: Token-Level Reasoning Reflectivity Meets Rubric Gates for Unverifiable Tasks

Researchers propose Direct Reasoning Optimization (DRO), a constrained reinforcement learning framework that improves LLM training on unverifiable tasks by combining token-level reasoning rewards with rubric-based feasibility gates. The approach demonstrates faster, more sample-efficient learning across scientific, medical, legal, and financial domains.

AIBullisharXiv – CS AI · Apr 146/10
🧠

CARO: Chain-of-Analogy Reasoning Optimization for Robust Content Moderation

Researchers introduce CARO, a two-stage training framework that enhances large language models' ability to perform robust content moderation through analogical reasoning. By combining retrieval-augmented generation with direct preference optimization, CARO achieves 24.9% F1 score improvement over state-of-the-art models including DeepSeek R1 and LLaMA Guard on ambiguous moderation cases.