EvoPool: Evolutionary Programmatic Annotation for Label-Efficient Specialized Supervision
EvoPool is an evolutionary multi-agent framework that generates specialized annotation code to label training data more efficiently than LLMs for domain-specific tasks. The system operates 4,500-31,000x faster than LLM annotation while achieving superior performance across biomedical, legal, and reasoning tasks, with improvements up to +0.301 macro-F1 on specialized benchmarks.
EvoPool addresses a critical bottleneck in machine learning: the cost of obtaining high-quality labeled data for specialized, high-stakes domains where general-purpose large language models underperform. Rather than relying on expensive human annotation or slower LLM-based approaches, the framework uses evolutionary algorithms to automatically generate and refine custom annotation code. This represents a paradigm shift in how practitioners can bootstrap specialized datasets without proportional increases in annotation costs.
The technical innovation lies in combining three elements: an evolutionary multi-agent system that proposes executable annotators, a fitness-based selection mechanism that filters annotators through viability and diversity checks, and EvoAgg, a text-aware aggregation system that converts noisy annotator votes into reliable soft labels. This approach draws inspiration from Darwinian evolution, where only annotators contributing novel and accurate signals survive to the next generation. The validation set provides the critical fitness signal, ensuring generated annotators align with domain-specific requirements.
For practitioners in biomedical, legal, and other specialized domains, EvoPool offers substantial practical advantages. The speed improvements—thousands of times faster than LLM annotation—make it feasible to label large datasets (100K+ examples) at near-zero marginal cost. Performance gains averaging +0.141 macro-F1 across complex tasks suggest the framework generates annotators that capture domain nuances better than general-purpose models. This democratizes access to high-quality labeled data, enabling smaller teams to compete with organizations that can afford extensive annotation budgets.
- →EvoPool generates specialized annotation code 4,500-31,000x faster than LLM annotation while achieving superior accuracy on domain-specific tasks
- →Evolutionary multi-agent framework automatically creates and refines custom annotators through fitness-based selection across generations
- →Achieves +0.141 average macro-F1 improvement over strongest LLM baselines across 7 of 8 biomedical, legal, and reasoning tasks
- →Text-aware aggregation system (EvoAgg) converts multiple annotator votes into reliable soft training labels
- →Enables cost-effective large-scale annotation for specialized domains with marginal per-example computational cost