AIBullisharXiv – CS AI · 3h ago6/10
🧠
Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL
Researchers demonstrate that extrapolative weight averaging—extending beyond trained model checkpoints—can navigate and extend correctness-efficiency frontiers in code reinforcement learning without additional training. Testing on competitive programming tasks reveals that ensembles using this technique improve performance by 3.3% on hard problems, suggesting a scalable method for optimizing AI systems across competing objectives.