AINeutralarXiv โ CS AI ยท 6d ago7/103
๐ง
InnoGym: Benchmarking the Innovation Potential of AI Agents
Researchers introduce InnoGym, the first benchmark designed to evaluate AI agents' innovation potential rather than just correctness. The framework measures both performance gains and methodological novelty across 18 real-world engineering and scientific tasks, revealing that while AI agents can generate novel approaches, they lack robustness for significant performance improvements.