PolyBuild: An End-to-End Method for Polygonal Building Contour Extraction from High-Resolution Remote Sensing Images
PolyBuild introduces an end-to-end deep learning method for extracting building polygon contours directly from high-resolution remote sensing images without post-processing. The hybrid CNN-Transformer architecture combines an Initial Contour Generation Module with a Contour Optimization Module to achieve superior performance over existing mask-based and contour-based approaches.
PolyBuild addresses a critical challenge in geospatial analysis and urban mapping by eliminating the traditional pipeline of pixel-level segmentation followed by error-prone post-processing steps. The method's direct extraction of vector polygons represents a meaningful advancement in computational efficiency and accuracy for remote sensing applications. This development matters because building contour extraction is fundamental to numerous use cases including urban planning, infrastructure management, disaster response, and property assessment—industries where accuracy directly impacts decision-making and resource allocation.
The approach builds on recent advances in computer vision where hybrid CNN-Transformer architectures have proven effective at capturing multi-scale spatial relationships. PolyBuild's two-module design—initial detection followed by iterative refinement—reflects a broader industry trend toward end-to-end learning systems that reduce manual intervention. By generating bounding boxes and utilizing sub-region center features simultaneously, the system achieves both object-level understanding and precise boundary delineation.
For developers and organizations relying on geospatial data, PolyBuild's demonstrated superiority over state-of-the-art methods signals improved capabilities for automated mapping workflows. Reduced computational overhead from eliminating post-processing steps translates to faster processing of large-scale remote sensing datasets, particularly valuable for real-time monitoring applications. The method's performance across three separate building datasets suggests robust generalization potential.
Looking ahead, the success of this approach may accelerate adoption of end-to-end extraction methods in related geospatial tasks such as road network extraction and land-use classification. The research demonstrates how transformer-based architectures continue penetrating computer vision domains beyond traditional benchmarks, with practical implications for industries dependent on accurate geospatial intelligence.
- →PolyBuild eliminates post-processing requirements by directly extracting vector polygons from remote sensing imagery.
- →Hybrid CNN-Transformer architecture effectively captures both local and global spatial relationships for improved boundary accuracy.
- →End-to-end approach reduces computational overhead compared to traditional segmentation-plus-refinement pipelines.
- →Performance gains demonstrated across three separate building datasets indicate strong generalization capabilities.
- →Direct application potential for urban planning, disaster response, and infrastructure management workflows.