y0news
← Feed
Back to feed
🧠 AI NeutralImportance 5/10

TML-Bench: Benchmark for Data Science Agents on Tabular ML Tasks

arXiv – CS AI|Mykola Pinchuk|
🤖AI Summary

Researchers introduced TML-Bench, a new benchmark for evaluating AI coding agents on tabular machine learning tasks similar to Kaggle competitions. The study tested 10 open-source language models across four competitions with different time budgets, finding that MiniMax-M2.1 achieved the best overall performance.

Key Takeaways
  • TML-Bench provides a standardized way to evaluate AI agents on data science tasks with real-world time constraints.
  • MiniMax-M2.1 outperformed other open-source language models across all four Kaggle-style competitions tested.
  • Performance generally improved with longer time budgets (240s, 600s, 1200s), though scaling varied by model.
  • Success rates and run-to-run variability were measured alongside median performance for comprehensive evaluation.
  • The benchmark focuses on end-to-end correctness and practical reliability rather than just code generation quality.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles