y0news
#kaggle2 articles
2 articles
AINeutralarXiv โ€“ CS AI ยท 6h ago5
๐Ÿง 

DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science

Researchers introduce DARE-bench, a new benchmark with 6,300 Kaggle-derived tasks for evaluating Large Language Models' performance on data science and machine learning tasks. The benchmark reveals that even advanced models like GPT-4-mini struggle with ML modeling tasks, while fine-tuning on DARE-bench data can improve model accuracy by up to 8x.

AIBullisharXiv โ€“ CS AI ยท 6h ago8
๐Ÿง 

CoMind: Towards Community-Driven Agents for Machine Learning Engineering

Researchers introduce CoMind, a multi-agent AI system that leverages community knowledge to automate machine learning engineering tasks. The system achieved a 36% medal rate on 75 past Kaggle competitions and outperformed 92.6% of human competitors in eight live competitions, establishing new state-of-the-art performance.