AIBullisharXiv – CS AI · 18h ago6/10
🧠
SAW: Stage-Aware Dynamic Weighting for Multi-Objective Reinforcement Learning in Large Language Models
Researchers introduce Stage-Aware Dynamic Weighting (SAW), a novel mechanism for multi-objective reinforcement learning in large language models that addresses the asynchronous nature of reward learning across different objectives. By using coefficient of variation as a real-time informativeness proxy, SAW dynamically reweights objective contributions to improve training efficiency and final performance with minimal computational overhead.