AINeutralarXiv โ CS AI ยท 14h ago6/10
๐ง
LABBench2: An Improved Benchmark for AI Systems Performing Biology Research
Researchers have released LABBench2, an upgraded benchmark with nearly 1,900 tasks designed to measure AI systems' real-world capabilities in biology research beyond theoretical knowledge. The new benchmark shows current frontier models achieve 26-46% lower accuracy than on the original LAB-Bench, indicating significant progress in AI scientific abilities while highlighting substantial room for improvement.
$OP๐ข Hugging Face