AIBullisharXiv – CS AI · 8h ago7/10
🧠
Adaptive Negative Reinforcement for LLM Reasoning:Dynamically Balancing Correction and Diversity in RLVR
Researchers propose Adaptive Negative Sample Reinforcement (A-NSR) and Confidence-Weighted Negative Reinforcement (CW-NSR) to improve LLM reasoning by dynamically adjusting penalty weights during training rather than applying fixed penalties. The methods are evaluated on challenging math datasets using Qwen2.5-Math-1.5B, demonstrating that intelligent error correction can match or exceed complex frameworks like PPO.