TML-Bench: Benchmark for Data Science Agents on Tabular ML Tasks
Researchers introduced TML-Bench, a new benchmark for evaluating AI coding agents on tabular machine learning tasks similar to Kaggle competitions. The study tested 10 open-source language models across four competitions with different time budgets, finding that MiniMax-M2.1 achieved the best overall performance.