AINeutralarXiv – CS AI · 7h ago6/10
🧠
MulFeRL: Enhancing Reinforcement Learning with Verbal Feedback in a Multi-turn Loop
Researchers introduce MulFeRL, a reinforcement learning framework that uses multi-turn verbal feedback to improve AI reasoning on failed tasks. By converting qualitative feedback into trainable signals and assigning credit for incremental progress, the approach outperforms traditional reward-based methods on math problems and generalizes well to unseen domains.