Consistent and Distinctive: LLM Benchmark Efficiency via Maximum Independent Set Prompt Selection on Similarity Graphs
Researchers propose a graph-based framework using Maximum Independent Set algorithms to efficiently benchmark large language models by selecting diverse, non-redundant prompt subsets. Testing across 66 LLMs and four major benchmarks demonstrates consistent rankings with 25-48% prompt reduction while maintaining reliability, offering significant computational savings for LLM evaluation.