🧠 AI🟢 BullishImportance 7/10

Can Generalist Agents Automate Data Curation?

arXiv – CS AI|Feiyang Kang, Hanze Li, Adam Nguyen, Mahavir Dabas, Jiaqi W. Ma, Frederic Sala, Dawn Song, Ruoxi Jia|June 4, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce Curation-Bench, a benchmark demonstrating that AI agents can automate data curation—a critical bottleneck in AI development—by iteratively proposing and refining data-selection policies. While agents reach strong baselines quickly, they struggle to explore novel approaches without structured scaffolding that guides them toward methodological adaptation rather than local optimization.

Analysis

Data curation represents one of the most resource-intensive aspects of modern AI development, requiring experts to repeatedly test, evaluate, and refine training datasets against benchmark feedback. This research demonstrates that generalist coding agents can participate in this workflow autonomously, executing command-line operations to inspect data, implement policies, and submit them for evaluation. The key finding reveals a critical limitation: without guidance, agents tend to optimize existing approaches rather than explore fundamentally different policy families, representing an execution-research gap between what agents can technically do and what they should explore strategically.

The breakthrough emerges when researchers add structure. By requiring agents to cite, instantiate, and adapt prior methods at each iteration, they shift agent behavior toward method-guided exploration. This scaffolded approach yields a composed data-selection policy that outperforms published baselines while using only one-tenth of the data—a substantial efficiency gain with direct implications for resource costs and computational requirements.

For the AI development ecosystem, this work suggests that agent-assisted research workflows could accelerate development cycles and democratize data engineering expertise. However, the reliance on scaffolding indicates that fully autonomous research requires more sophisticated reasoning about novelty and strategy. The open-sourced benchmark enables further research on agent-guided optimization, potentially leading to hybrid human-agent workflows where humans set strategic directions while agents execute iterative refinement at scale.

Key Takeaways

→Generalist agents can successfully automate data-curation loops, reaching strong baselines within ten iterations without task-specific training.
→Unguided agents exhibit an execution-research gap, optimizing local variants rather than exploring new policy families.
→Structured scaffolding requiring method citation and adaptation shifts agents toward methodological exploration and better outcomes.
→A scaffolded agent composed a data-selection policy that outperformed baselines using only 10% of the original data budget.
→Reliable AI research automation requires hybrid human-agent collaboration with structured guidance rather than open-ended prompting.