AIBullisharXiv โ CS AI ยท 4d ago7/103
๐ง
DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
Researchers propose Decoupled Reward Policy Optimization (DRPO), a new framework that reduces computational costs in large reasoning models by 77% while maintaining performance. The method addresses the 'overthinking' problem where AI models generate unnecessarily long reasoning for simple questions, achieving significant efficiency gains over existing approaches.