Planktonzilla: Multimodal dataset and models for understanding plankton ecosystems
Researchers introduce Planktonzilla-17M, the largest unified plankton image dataset with 17.4 million images across 602 taxonomic classes from thirteen imaging systems. The work demonstrates that supervised learning with taxonomic lineage outperforms CLIP-style training and reveals limitations in current biological foundation models like BioCLIP for marine imaging applications.
Marine plankton classification represents a critical intersection of climate science and AI, as accurate species identification enables better understanding of ocean health and carbon sequestration—processes fundamental to climate modeling. The Planktonzilla project addresses a fundamental problem in biological AI: existing models trained on isolated datasets fail when applied to different instruments and environments, a fragmentation that has hampered scientific progress despite abundant raw data.
This work emerges from a broader trend toward unified, large-scale datasets in specialized domains. Unlike general-purpose computer vision, marine biology requires standardized taxonomy and environmental metadata across heterogeneous imaging systems—a data integration challenge that mirrors similar efforts in medical imaging and satellite Earth observation. The consolidation of thirteen previously isolated plankton image collections into a single resource represents significant infrastructure work that enables reproducible science.
The research reveals important limitations in current biological foundation models. BioCLIP and BioCLIP2, trained on broader biological datasets, underperform in zero-shot and few-shot plankton classification, suggesting that domain-specific training data remains critical for specialized scientific imaging. However, the finding that supervised learning with taxonomic lineage matches or exceeds CLIP-style approaches on Planktonzilla-17M indicates that leveraging domain structure can offset foundation model limitations.
For the AI research community, this work establishes a benchmark dataset for marine biology applications while demonstrating that foundation models require specialized fine-tuning for niche scientific domains. Future developments may involve creating similar unified datasets for other ecological imaging domains and exploring whether improved foundation models can narrow the generalization gap in specialized scientific applications.
- →Planktonzilla-17M consolidates 17.4 million plankton images from thirteen instruments, creating the largest unified dataset for marine species classification.
- →Supervised learning with taxonomic lineage outperforms CLIP-style training on plankton data, challenging assumptions about foundation model superiority.
- →Current biological foundation models like BioCLIP perform poorly on specialized marine imaging tasks despite broader training, highlighting domain-specific limitations.
- →Standardized taxonomy and geo-environmental metadata across heterogeneous imaging systems proved essential for building generalizable marine classification models.
- →The research identifies critical gaps in biological AI infrastructure for scientific applications requiring specialized imaging and standardized taxonomic labels.