AIBullisharXiv – CS AI · May 47/10
🧠Researchers introduce ML-Agent, a 7B parameter LLM trained through reinforcement learning to perform autonomous machine learning engineering tasks. The approach achieves performance comparable to much larger proprietary models like GPT-5 while requiring significantly lower computational resources, demonstrating that smaller models can effectively learn from execution trajectories rather than relying solely on prompting.
🧠 GPT-5
AIBullisharXiv – CS AI · Mar 37/103
🧠Researchers introduce AceGRPO, a new reinforcement learning framework for Autonomous Machine Learning Engineering that addresses behavioral stagnation in current LLM-based agents. The Ace-30B model trained with this method achieves 100% valid submission rate on MLE-Bench-Lite and matches performance of proprietary frontier models while outperforming larger open-source alternatives.
AINeutralarXiv – CS AI · 3d ago6/10
🧠BiasEdit is a new framework that automatically detects and removes social biases from web-sourced image datasets without manual annotation, using vision-language models and text-guided image editing. The method addresses a critical problem in AI where neural networks trained on biased web data perpetuate unfairness in downstream applications like recommendation systems and content moderation.
🏢 Meta
AINeutralarXiv – CS AI · 4d ago6/10
🧠A comprehensive benchmark study reveals that properly calibrated rule-based autoscalers outperform six mainstream deep reinforcement learning algorithms on cost in adaptive resource control tasks. The research challenges assumptions about DRL superiority, identifying baseline calibration and reward engineering as greater bottlenecks than algorithm selection.
AIBullisharXiv – CS AI · Mar 27/1017
🧠Researchers introduce CoMind, a multi-agent AI system that leverages community knowledge to automate machine learning engineering tasks. The system achieved a 36% medal rate on 75 past Kaggle competitions and outperformed 92.6% of human competitors in eight live competitions, establishing new state-of-the-art performance.
AIBullishGoogle Research Blog · Aug 16/107
🧠MLE-STAR represents a new state-of-the-art machine learning engineering agent that advances automated ML capabilities. The development showcases continued progress in AI automation tools for machine learning workflows.
AINeutralOpenAI News · Oct 105/1010
🧠MLE-bench is a new benchmark tool designed to evaluate how effectively AI agents can perform machine learning engineering tasks. This represents a step forward in standardizing the assessment of AI capabilities in practical ML workflows and engineering processes.