Benchmarking ResNet Backbones in RT-DETR: Impact of Depth and Regularization under environmental conditions
This research benchmarks RT-DETR object detection models with different ResNet backbones for competitive robotics applications, evaluating how environmental variations like lighting and background contrast affect detection performance. The study finds that intermediate-depth models (ResNet34 and ResNet50) offer optimal balance between accuracy, confidence, and latency, with ResNet50 excelling under illumination changes and ResNet34 performing best under background variations.
This technical study addresses a practical gap in transformer-based detector research by systematically evaluating how backbone architecture and environmental conditions affect real-time object detection in robotics. The work is significant because competitive robotics operates in unpredictable environments where lighting and visual conditions constantly shift, yet most detection benchmarks focus on controlled datasets. The research tests four ResNet variants with different regularization parameters, finding that environmental factors primarily degrade prediction confidence rather than accuracy or speed—a nuanced finding that distinguishes between different failure modes.
The robotics community has increasingly adopted deep learning for visual perception, but deployment decisions often rely on limited environmental testing. This study demonstrates that backbone selection should depend on the specific environmental challenge encountered. ResNet50 maintains confidence above 0.86 under lighting variations while preserving near-perfect accuracy and sub-60-microsecond latency, making it ideal for brightness-variable environments. ResNet34 proves more robust to background clutter, suggesting that depth-accuracy tradeoffs matter more for specific environmental conditions than previously understood.
For robotics teams and computer vision practitioners, these findings inform architecture selection without requiring expensive retraining across different conditions. The consistent high accuracy across models indicates that confidence calibration and environmental robustness matter more than raw architecture complexity in practical deployments. The research validates that intermediate-depth models can match larger architectures while maintaining inference speed critical for real-time robotics applications.
- →ResNet50 achieves the best performance under illumination variations with 0.869 confidence and 0.058ms latency
- →ResNet34 provides superior robustness under background contrast changes with confidence reaching 0.887
- →Environmental conditions primarily impact confidence scores rather than classification accuracy or inference latency
- →Intermediate-depth backbones offer optimal balance between performance metrics and computational efficiency for robotics
- →Architecture selection for object detection should consider specific environmental variation types rather than absolute model size