AINeutralarXiv โ CS AI ยท 14h ago6/10
๐ง
SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?
Researchers introduce SciPredict, a benchmark testing whether large language models can predict scientific experiment outcomes across physics, biology, and chemistry. The study reveals that while some frontier models marginally exceed human experts (~20% accuracy), they fundamentally fail to assess prediction reliability, suggesting superhuman performance in experimental science requires not just better predictions but better calibration awareness.