AIBullisharXiv โ CS AI ยท 4h ago2
๐ง
InfoPO: Information-Driven Policy Optimization for User-Centric Agents
Researchers introduce InfoPO (Information-Driven Policy Optimization), a new method that improves AI agent interactions by using information-gain rewards to identify valuable conversation turns. The approach addresses credit assignment problems in multi-turn interactions and outperforms existing baselines across diverse tasks including intent clarification and collaborative coding.