AIBullisharXiv – CS AI · 10h ago6/10
🧠
Active Testing of Large Language Models via Approximate Neyman Allocation
Researchers introduce a novel active testing algorithm that reduces evaluation costs for large language models by intelligently sampling from evaluation pools using semantic entropy and approximate Neyman allocation. The method achieves up to 28% MSE reduction over uniform sampling while saving an average of 22.9% of evaluation budget across multiple benchmarks.