XiYOLO: Energy-Aware Object Detection via Iterative Architecture Search and Scaling
XiYOLO is a new energy-efficient object detection framework that uses neural architecture search and scaling techniques to optimize AI models for edge devices with strict power constraints. The system achieves 20-53% energy reductions compared to YOLOv12 baselines across GPU and NPU deployments while maintaining competitive accuracy metrics.
XiYOLO addresses a critical challenge in edge AI deployment: balancing computational performance with severe energy constraints on heterogeneous hardware. Traditional object detection models like YOLO prioritize accuracy over power efficiency, making them unsuitable for battery-powered edge devices, autonomous systems, and IoT infrastructure where energy consumption directly impacts operational costs and device longevity. The research tackles the fundamental problem that real energy consumption is highly device-dependent and expensive to measure during development.
The framework's innovation lies in its two-stage approach: first identifying an efficient base architecture through energy-aware neural architecture search, then applying compound scaling to create a family of models across different deployment budgets. This methodology enables developers to make interpretable accuracy-energy tradeoffs without requiring extensive hardware testing. The two-stage energy estimator demonstrates practical value through few-shot adaptation, requiring only 2-20 target-device samples rather than costly full retraining.
For the AI infrastructure industry, this represents progress toward production-ready edge AI systems. Energy efficiency directly reduces deployment costs for edge computing at scale—critical for robotics, autonomous vehicles, surveillance systems, and industrial IoT. The 35-53% energy savings at comparable accuracy levels suggest meaningful cost reduction potential across these sectors. The research validates that systematic architecture search can outperform hand-optimized models when energy constraints are primary objectives.
Developers should monitor whether XiYOLO techniques transfer to other detection architectures and vision tasks. The generalizability of the energy estimator approach across different hardware platforms determines its practical impact on future edge AI development workflows.
- →XiYOLO achieves 20-53% energy reductions versus YOLOv12 across GPU and NPU deployments with maintained accuracy
- →Two-stage energy estimator enables few-shot hardware adaptation using only 2-20 target-device samples
- →Framework combines neural architecture search with compound scaling to create interpretable accuracy-energy tradeoffs
- →Results validated on PascalVOC and COCO datasets with real-device deployment measurements
- →Approach addresses device-dependent energy measurement challenges that limit existing edge AI optimization methods