AutoRelAnnotator: Calibrated Model Cascades for Cost-Efficient Relevance Evaluation in Sponsored Search
Researchers introduced AutoRelAnnotator, a calibrated model cascade system that generates high-quality relevance annotations for search ranking systems at significantly lower cost than human labeling. The approach combines domain-specific fine-tuning, progressive model cascading, and isotonic calibration to achieve production-grade accuracy while reducing compute costs by approximately 50%, with validation across 150M+ annotations in real-world search and advertising systems.
AutoRelAnnotator addresses a fundamental challenge in machine learning infrastructure: the expensive bottleneck of generating high-quality training data and evaluation metrics for ranking systems. Traditional relevance annotation relies on human labelers, creating delays and prohibitive costs at scale. While large language models offer automation, they typically underperform on domain-specific tasks without proper calibration and fine-tuning. The research demonstrates that accuracy and cost optimization are separable concerns—fine-tuning contributes 20 accuracy points, cascading reduces compute by half while maintaining accuracy, and per-class isotonic calibration adds marginal but statistically significant gains. This modular architecture allows practitioners to tune the accuracy-cost tradeoff based on specific business constraints. The production validation across six use cases and 150M annotations represents substantial real-world evidence of practical viability. For search and advertising platforms, this innovation directly impacts development velocity and experimentation cycles. Faster annotation pipelines enable more frequent A/B tests, quicker root cause analysis, and reduced time-to-market for ranking improvements. The approach is particularly valuable for platforms managing diverse query domains where generic models fail. Beyond immediate cost savings, this work influences how companies approach data infrastructure: rather than investing in larger labeling teams, resources can shift toward building specialized fine-tuned classifiers. The isotonic calibration contribution adds a refinement layer that competitors may need to replicate. Looking ahead, this methodology could extend to other annotation tasks requiring domain expertise, potentially reshaping how AI systems are trained and evaluated across the industry.
- →Fine-tuned classifiers contribute 20 accuracy points while model cascading halves compute costs with neutral accuracy impact.
- →Per-class isotonic calibration adds +0.6 accuracy points over baseline calibration methods in production environments.
- →The system successfully processed 150M+ annotations across six real-world use cases, proving production-grade viability.
- →Modular design separates accuracy and cost optimizations, enabling flexible tuning for different business constraints.
- →Faster annotation pipelines directly accelerate experimentation cycles for search ranking and advertising systems.