AINeutralarXiv – CS AI · 7h ago6/10
🧠
When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL
Researchers demonstrate that LLM-generated reward functions for reinforcement learning tasks fail in predictable ways and are better treated as an iterative debugging process rather than one-shot generation. Using diagnostic-driven refinement guided by failure-mode taxonomy, they improve task success rates significantly (DoorKey-8x8: 2.3% to 97.6%), though the method shows limitations in dense-reward continuous control and requires reliable semantic interfaces.