AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation
Researchers introduce AutoScientists, a decentralized multi-agent AI system that autonomously conducts long-running scientific experiments by self-organizing teams, critiquing proposals, and sharing failures. The system outperforms single-agent approaches across biomedical machine learning, language model optimization, and protein prediction tasks, achieving significant improvements in speed and accuracy.
AutoScientists represents a meaningful shift in how AI systems approach complex, iterative research problems. Rather than relying on centralized planning or single research trajectories, the system uses decentralized agent coordination to explore multiple hypotheses simultaneously while learning from failures—a approach that mirrors how human research teams actually operate. This distributed methodology addresses fundamental limitations in prior AI research automation systems that struggle with adaptive exploration over extended periods.
The advancement builds on growing recognition that multi-agent systems can outperform single-agent approaches in uncertain, complex domains. Previous AI research automation tools typically followed predetermined paths or required external orchestration, limiting their ability to pivot based on experimental evidence. AutoScientists distributes decision-making across agents that interpret shared state, propose experiments, critique each other's ideas, and collectively maintain institutional knowledge of failed directions.
The empirical results demonstrate measurable advantages across diverse domains. On BioML-Bench's 24 tasks spanning imaging to drug discovery, the system achieved a 74.4% mean leaderboard percentile—8.33% higher than prior best AI agents. In GPT optimization benchmarks, it reached target performance 1.9x faster and discovered improvements where previous systems stalled entirely. The protein fitness prediction improvements of 6.5% to 12.5% in Spearman correlation have direct applications for biological research and drug development.
These results suggest decentralized agent architectures may become standard for complex scientific automation. The approach's success without task-specific modifications indicates the methodology generalizes across domains, potentially opening pathways for AI-assisted discovery in materials science, chemistry, and other research-intensive fields. Future development likely focuses on scaling coordination mechanisms and integrating wet-lab experimental systems.
- →AutoScientists achieves 8.33% improvement over prior AI agents on BioML-Bench across 24 biomedical and drug discovery tasks.
- →Decentralized team coordination enables agents to explore multiple hypotheses simultaneously while preserving knowledge of failed experimental directions.
- →System reaches GPT training optimization targets 1.9x faster than baseline approaches and discovers improvements where single-agent methods plateau.
- →Protein fitness prediction methods generalized across 217 ProteinGym assays, improving over prior state-of-the-art by 6.5% in correlation metrics.
- →Multi-agent architecture demonstrates domain-agnostic effectiveness without task-specific modifications, suggesting broad applicability to scientific research automation.