AINeutralarXiv – CS AI · 6h ago6/10
🧠
RankLLM: Weighted Ranking of LLMs by Quantifying Question Difficulty
Researchers introduce RankLLM, a novel evaluation framework that quantifies both question difficulty and model competency to create more nuanced LLM benchmarks. The system uses bidirectional score propagation between models and questions, achieving 90% agreement with human judgment while outperforming existing methods like Item Response Theory.