FLORO: A Multimodal Geospatial Foundation Model for Ecological Remote Sensing Across Sensors and Scales
FLORO is a multimodal geospatial foundation model that learns from diverse remote sensing data across multiple sensor types and resolutions with minimal pretraining data. Despite using significantly smaller datasets than competing models, FLORO demonstrates strong transfer learning performance on ecological and environmental applications, achieving competitive results on scene classification, segmentation, and regression tasks.
FLORO addresses a critical limitation in current foundation models: their dependence on massive, homogeneous datasets and fixed sensor configurations that poorly serve ecological monitoring applications. Remote sensing for environmental purposes inherently involves heterogeneous data sources—satellite imagery from Sentinel missions, high-resolution airborne SkySAT data, and UAV-derived measurements operate at different spectral and spatial resolutions. FLORO's innovation lies in its availability-aware input mechanism, which creates a unified framework accommodating missing modalities and variable sensor configurations without requiring complete feature parity across samples.
The model's performance against the PANGAEA benchmark provides meaningful context. Achieving second-best segmentation performance while using over 100 times less pretraining data than the leading model suggests FLORO achieves better data efficiency through diversity. This matters significantly for environmental science, where labeled, diverse remote sensing datasets remain scarce and expensive to acquire. Qualitative improvements in spatial structure preservation for flood prediction, urban mapping, and biomass estimation indicate the model captures ecologically relevant features beyond statistical associations.
The geo-positional encoding improvement on EuroSAT-MS demonstrates that spatial context—inherently important for geographic phenomena—enhances transfer learning better than absolute position information. This finding aligns with how environmental processes operate across scales and regions. For environmental monitoring applications, FLORO's efficiency enables broader accessibility to sophisticated remote sensing analysis without requiring institutional-scale computational resources for pretraining. The work suggests foundation models can effectively serve specialized domains through strategic architectural choices rather than simply scaling data and parameters.
- →FLORO achieves competitive performance with 100x less pretraining data through multimodal diversity and availability-aware architecture
- →Unified input space accommodates sensor variability, enabling robust transfer across satellite, airborne, and UAV imagery
- →Strong segmentation performance on ecological tasks suggests improved spatial structure preservation critical for environmental applications
- →Geo-positional encoding outperforms absolute positioning, indicating spatial context importance for geographic foundation models
- →Smaller, diverse datasets may outperform massive homogeneous ones for specialized domain applications like ecological monitoring