AIBullisharXiv โ CS AI ยท 5h ago
๐ง
Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback
Researchers developed a new variance-reduced EXP4-based algorithm for optimizing routing policies in multi-layer hierarchical inference systems. The solution addresses the challenge of sparse, policy-dependent feedback in AI systems where prediction errors are only revealed at terminal layers, improving stability and performance over standard importance-weighted approaches.