CAFOSat: A Strongly Annotated Dataset for Infrastructure-Aware CAFO Mapping Using High-Resolution Imagery
Researchers introduce CAFOSat, a large-scale annotated dataset containing over 45,000 image patches for mapping Concentrated Animal Feeding Operations across the United States using high-resolution satellite imagery. The dataset combines AI-assisted annotation, human verification, and infrastructure-level labeling to address challenges in automated CAFO detection, benchmarking multiple deep learning models for improved agricultural monitoring capabilities.
CAFOSat represents a significant advancement in agricultural remote sensing by tackling the practical challenge of automating CAFO detection at scale. The dataset addresses real limitations in existing resources: heterogeneous facility layouts, unreliable geolocation data, and sparse annotations that have hindered previous mapping efforts. By integrating National Agriculture Imagery Program (NAIP) data with curated inventories from multiple states and applying human-in-the-loop refinement, the researchers created a resource that bridges the gap between raw satellite imagery and actionable agricultural intelligence.
The dataset's infrastructure-aware design distinguishes it from generic object detection benchmarks. Rather than simply identifying CAFOs as points on a map, CAFOSat annotates specific facility components—barns, manure ponds, and grazing features—enabling more granular analysis. This level of detail supports diverse agricultural monitoring use cases, from environmental compliance assessment to disease surveillance during animal health crises. The synthetic augmentation pipeline and careful negative sampling strategy demonstrate awareness of real-world deployment challenges where model robustness across different geographic regions and imaging conditions matters significantly.
For the agricultural technology sector, CAFOSat enables development of more sophisticated monitoring tools. Regulators gain better capabilities for environmental compliance tracking, while researchers can advance disease surveillance systems. The benchmark results across convolutional networks, transformers, and vision-language models provide developers with baseline performance metrics for production systems. Looking ahead, similar infrastructure-aware datasets for other agricultural operations could unlock broader digital agriculture applications, particularly in water management, emissions tracking, and supply chain transparency initiatives that increasingly matter to investors and consumers.
- →CAFOSat dataset contains 45,000+ annotated image patches across 20 states addressing heterogeneous CAFO detection challenges.
- →Human-in-the-loop annotation pipeline combines AI-assisted labeling with GradCAM localization to refine weak geolocation records.
- →Infrastructure-level annotations enable detailed facility component identification beyond binary CAFO presence detection.
- →Benchmarking of diverse model architectures demonstrates synthetic augmentation improves robustness under distribution shifts.
- →Dataset supports environmental monitoring, regulatory compliance, and disease surveillance applications in agricultural operations.