#ml-optimization News & Analysis

5 articles tagged with #ml-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles

AIBullisharXiv – CS AI · Jun 27/10

🧠

DOT-MoE: Differentiable Optimal Transport for MoEfication

Researchers introduce DOT-MoE, a framework that converts dense language models into sparse Mixture-of-Experts architectures using differentiable optimal transport. The method achieves 90% performance retention while reducing active parameters by 50%, addressing a critical bottleneck in LLM inference efficiency without the instability of training MoEs from scratch.

$DOT

AIBullisharXiv – CS AI · Jun 17/10

🧠

DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning

Researchers propose DARTS, a novel approach to accelerate large language model reinforcement learning by reshaping the rollout distribution toward conciseness and certainty, reducing computational inefficiencies caused by long-tail response lengths. The method achieves up to 1.77x speedup through distribution-aware trajectory sampling without sacrificing model performance.

AINeutralarXiv – CS AI · Jun 256/10

🧠

Learning with a Single Rollout via Monte Carlo Pass@k Critic

Researchers propose SR-PPO, a reinforcement learning method that trains language models using single rollouts and Monte Carlo Pass@k critics for token-level credit assignment. The approach reduces computational costs while improving reasoning performance on mathematical benchmarks like HMMT26 and AIME24 by using reachability-based advantage estimation instead of repeated sampling.

AINeutralarXiv – CS AI · Jun 16/10

🧠

FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics

Researchers introduce FML-Bench, a standardized benchmark for evaluating AI research agents that separates strategy from infrastructure, revealing that simple greedy algorithms perform comparably to complex tree-search methods. The study identifies that exploration strategy effectiveness depends on the underlying structure of optimization opportunities, with an adaptive agent demonstrating superior performance by switching strategies based on improvement stagnation detection.

AIBullisharXiv – CS AI · May 116/10

🧠

Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models

Researchers introduce Miner, a novel reinforcement learning method that leverages a model's intrinsic uncertainty as a self-supervised reward signal to improve training efficiency for large reasoning models. The approach achieves state-of-the-art results on reasoning benchmarks, with performance gains up to 4.58 points in Pass@1 metrics compared to existing methods, addressing a critical inefficiency in current critic-free RL training.