AIBullisharXiv โ CS AI ยท 14h ago6/10
๐ง
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping
Researchers introduce MEDS, a memory-enhanced reward shaping framework that addresses a critical reinforcement learning failure mode where language models repeatedly generate similar errors. By tracking historical behavioral patterns and penalizing recurring mistake clusters, the method achieves consistent performance improvements across multiple datasets and models while increasing sampling diversity.