AINeutralarXiv – CS AI · Apr 146/10
🧠
LABBench2: An Improved Benchmark for AI Systems Performing Biology Research
Researchers have released LABBench2, an upgraded benchmark with nearly 1,900 tasks designed to measure AI systems' real-world capabilities in biology research beyond theoretical knowledge. The new benchmark shows current frontier models achieve 26-46% lower accuracy than on the original LAB-Bench, indicating significant progress in AI scientific abilities while highlighting substantial room for improvement.
$OP🏢 Hugging Face