#model-reasoning News & Analysis

5 articles tagged with #model-reasoning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles

AINeutralarXiv – CS AI · Jun 57/10

🧠

The Self-Correction Illusion: LLMs Correct Others but Not Themselves

Researchers discovered that large language models refuse to correct their own reasoning errors but readily accept corrections when identical claims come from external sources like users or tools. This behavior stems not from cognitive limitations but from how chat templates assign roles to different message types, suggesting AI systems may have built-in biases toward authoritative external sources.

AIBullisharXiv – CS AI · Mar 46/102

🧠

When Scaling Fails: Mitigating Audio Perception Decay of LALMs via Multi-Step Perception-Aware Reasoning

Researchers identified a critical problem in Large Audio-Language Models (LALMs) where audio perception deteriorates during extended reasoning processes. They developed MPAR² framework using reinforcement learning, which improved perception performance from 31.74% to 63.51% and achieved 74.59% accuracy on MMAU benchmark.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

Researchers propose CPPO (Cumulative Prefix-divergence Policy Optimization), a new reinforcement learning method that improves upon standard PPO approaches for LLM training by accounting for position-dependent effects and cumulative policy divergence. The method uses position-weighted thresholds and prefix budgets to better regulate token-level deviations during autoregressive generation, showing improved training stability and reasoning accuracy across model scales.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention

Researchers propose Selective-adversarial Entropy Intervention (SaEI), a novel method that improves reinforcement learning-based visual reasoning in vision-language models by strategically introducing adversarial perturbations to visual inputs during RL sampling. The technique combines entropy-guided adversarial sampling with token-selective entropy computation to enhance policy exploration without compromising the models' factual knowledge.

AIBullisharXiv – CS AI · May 296/10

🧠

LsrIF: Enhancing Logic-Structured Instruction Following of Large Language Models

Researchers introduce LsrIF, a training framework that improves how large language models follow complex instructions by recognizing logical structures like sequential dependencies and conditional branching. The method uses structure-aware reward aggregation instead of simple averaging, demonstrating improved instruction-following performance both within and across domains.