AIBullisharXiv – CS AI · 7h ago7/10
🧠
BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution
BenchEvolver is an AI framework that automatically generates harder variants of existing coding problems to address benchmark saturation, where frontier LLMs now achieve 99% accuracy on standard tests. By evolving solutions rather than creating problems from scratch, it produces verifiable, diverse tasks that maintain challenge even for their generating models, enabling both better evaluation and improved training signals.