AINeutralarXiv – CS AI · 6h ago6/10
🧠
Reasoning or Memorization? Direction-Aware Diversity Exploration in LLM Reinforcement Learning
Researchers introduce DiRL, a reinforcement learning framework that distinguishes between genuine reasoning and memorization in large language models by anchoring exploration to an internal reasoning-memorization direction. The method integrates with Group Relative Policy Optimization to improve performance on mathematical and reasoning benchmarks while suppressing exploration of memorized shortcuts.