AIBullisharXiv – CS AI · 8h ago6/10
🧠
MINCE: Shrinking LLM Evaluation Datasets via Few-Model Monte Carlo Calibration
Researchers introduce MINCE, a novel method that significantly reduces the computational cost of evaluating large language models by intelligently shrinking benchmark datasets. Using Monte Carlo simulation with minimal calibration models, MINCE achieves 54-89% dataset size reductions while maintaining accuracy within acceptable drift thresholds, enabling 2.7-8.1x faster GPU evaluations.