AINeutralarXiv – CS AI · 6h ago6/10
🧠
CombEval: A Framework for Evaluating Combinatorial Counting in Large Language Models
Researchers introduce CombEval, a dynamic benchmark framework for evaluating how well large language models handle combinatorial counting problems. Testing 11 LLMs reveals significant brittleness in handling ordered objects, indistinguishable elements, and nested dependencies, with code-augmented approaches showing modest improvements over direct reasoning.