AINeutralarXiv โ CS AI ยท Feb 276/107
๐ง
SPM-Bench: Benchmarking Large Language Models for Scanning Probe Microscopy
Researchers have developed SPM-Bench, a PhD-level benchmark for testing large language models on scanning probe microscopy tasks. The benchmark uses automated data synthesis from scientific papers and introduces new evaluation metrics to assess AI reasoning capabilities in specialized scientific domains.