AIBullisharXiv – CS AI · May 127/10
🧠Researchers introduce EvoMAS, a framework that dynamically constructs multi-agent workflows during task execution rather than using static, pre-optimized designs. The system uses a Planner-Evaluator-Updater pipeline to assess task state and adapts agent coordination across execution stages, demonstrating superior performance on complex reasoning tasks compared to existing approaches.
AIBullisharXiv – CS AI · Mar 37/103
🧠Researchers have developed Curvature-Aware Policy Optimization (CAPO), a new algorithm that improves training stability and sample efficiency for Large Language Models by up to 30x. The method uses advanced mathematical optimization techniques to identify and filter problematic training samples, requiring intervention on fewer than 8% of tokens.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce ReMax Actor-Critic (ReMAC), extending retry-based policy gradient methods from discrete to continuous action spaces. The approach uses pathwise derivative estimators to optimize pass@K and max@K objectives, promoting exploration through policy-gradient landscape reshaping rather than explicit entropy bonuses, achieving performance comparable to SAC.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce MaxPO, a new policy-gradient method that improves advantage estimation for max@K objectives in reinforcement learning, addressing challenges in LLM post-training by reducing gradient variance through a Leave-Two-Out baseline that ensures centered advantages.
AIBullishOpenAI News · Apr 186/105
🧠Researchers have released Evolved Policy Gradients (EPG), an experimental metalearning approach that evolves the loss function of AI learning agents to enable faster training on new tasks. The method allows agents to generalize beyond their training data, successfully performing basic tasks in novel scenarios they weren't specifically trained for.
AINeutralarXiv – CS AI · Mar 114/10
🧠Researchers propose CORA, a new cooperative game-theoretic method for credit assignment in multi-agent reinforcement learning that uses coalition-wise advantage allocation. The approach addresses policy optimization challenges by evaluating marginal contributions of different agent coalitions and demonstrates superior performance across various benchmarks.
AINeutralarXiv – CS AI · Mar 94/10
🧠Researchers propose a new reinforcement learning approach for large language models that optimizes for subsets of future rewards rather than full sequences. The method enables comparison of different policy classes and shows varying effectiveness across different conversational AI alignment tasks.
AINeutralOpenAI News · Apr 211/107
🧠The article appears to discuss a theoretical equivalence between policy gradient methods and soft Q-learning in reinforcement learning. However, the article body is empty, making detailed analysis impossible.